All users who have set a concurrent connection value to the MySQL server for a user account in excess of 15 concurrent connections have been reset to 15. This change is to curb a problem seen too often as of late with poor database design and connection pile-ups. Instead of attacking the problem head-on– the database structure, users instead option to increase the number of connections to the server. This does not solve the problem, but rather is targeted at deferring a symptom of the underlying cause — poor database design.
Poor database design, such as braindead structuring and lack of indexes on columns used in where clauses (important!) can cause a 0.10 ms query to take in excess of 1000+ seconds to complete. By increasing the connection limit, instead of adding an index to a particular column, the underlying cause is not addressed, but rather an indication of an underlying a problem is erroneously addressed. A higher connection limit thus in turn causes more connections to pile-up eventually tripping off our monitoring systems and alerting us with annoying, yet effective, pages.
On high volume servers with multiple users who have maxed out their concurrent connection limit with a full disk quota it’s not uncommon to see the server become temporarily inaccessible until the watchdog restarts the MySQL process. Here’s the nitty-gritty explanation why: everyone has 10 connections. If two sites both fill up their disk quotas, causing MySQL queries to wait endlessly, never to complete, then 20/100 connection slots for the server are filled. Now let’s introduce another user, whose tables lack indexes, which over time, have grown immensely. Here’s a sample query, pulled from a server yesterday (names changed to protect the innocent):
+--------+---------+-----------+----------------+---------+-------+-------------+---------
| Id | User | Host | db | Command | Time | State | Info
+--------+---------+-----------+----------------+---------+-------+--------------+-
| 292663 | x | localhost | db | Query | 552 | Sending data | query #1
| 292668 | x | localhost | db | Query | 552 | Sending data | query #1
| 292686 | x | localhost | db | Query | 552 | Sending data | query #1
| 292765 | x | localhost | db | Query | 431 | Locked | query #1
| 292812 | x | localhost | db | Query | 431 | Locked | query #1
| 292813 | x | localhost | db | Query | 319 | Locked | query #1
| 292814 | x | localhost | db | Query | 319 | Locked | query #1
| 292815 | x | localhost | db | Query | 430 | Locked | query #1
| 292816 | x | localhost | db | Query | 431 | Locked | query #1
| 292817 | x | localhost | db | Query | 430 | Locked | query #1
| 292818 | x | localhost | db | Query | 430 | Locked | query #1
| 292819 | x | localhost | db | Query | 430 | Locked | query #1
| 292820 | x | localhost | db | Query | 549 | Sending data | query #2
As you can tell, the user has 12 clients working on one query and one working on another query, on the same table, which is locked, waiting for query #1 to complete. Over time these connections build-up. What if the client adjusts the connection limit threshold to 70 on top of the 20 connection slots filled by users who have gone over disk quota? It would place the server in a bind with only 10 connection slots available for clients. Your Rails applications, your mysql commandline, your control panel access — all of those which use persistent connections have to fight with one another to try to secure one of the 10 remaining slots. Oh and to make matters worse? You also have to let the 2,000+ PHP scripts requested every minute an opportunity to connect to the MySQL server to issue a query or two… there’s no room left for the normal use of MySQL on the server. This is why I designed the servers with strict concurrent connection thresholds– to avoid resource contention and provide a gentle notice that you may have a problem. If you ever run into an error on the server, then there’s an issue that needs to be addressed. File a trouble ticket within the control panel. It never hurts to ask, remember I wrote everything from the ground up; I can tell you or at least point you towards a solution.
Few clients are genuinely in need of 15 – 25 concurrent connections and those are the sites that push 95% of their 100 – 200 GB bandwidth quota during only 5% of their bandwidth cycle. These are very high volume, seasonal sites. Users attempting to set a concurrent connection value above 10 within the control panel will require a trouble ticket explaining the need for a higher limit. Again, 10 shouldn’t pose a problem for most users who have a properly indexed database — even the many who do land on Digg continue to hum without issue.
PS: The story has a happy ending. I consolidated the need for 99+ clients, doing absolutely nothing, waiting for query #1 to complete by adding two indexes on a table, one of which went on the id column. The customer’s concurrent connection limit was decreased back down to 10 without any additional issues.
Problematic query: SELECT count(*) FROM `user` WHERE `id` = '1'
– Matt
Supplements:
- hackmysql.com: A good overview on index placement and database design
This would explain why our highest traffic site has had intermittent failures all day. You mention Rails in the blog post, so I’d suggest fellow Rails users to check out ruby-prof for finding bottlenecks in poorly generated SQL queries. About to use it myself!
Thanks Matt.
Borel’s problem is different and currently being discussed on the forums — http://forums.apisnetworks.com/showthread.php?t=357
I am waiting for the logs to come through to determine which user is responsible for a 2 GB+ query. The largest database is 1 GB on the server, so what exactly this elusive query looks like is anyone’s guess.