Category: Uncategorized

Advice on pt-stalk for connection saturated servers

Given an environment where a high volume web application is prone to opening many connections to backend resources, using a utility like pt-stalk is important.  When performance or availability affecting events like innodb lock waits or connection saturation occur, pt-stalk helps give you information you may need in troubleshooting what was happening.

The tendency may be to create multiple pt-stalks for various conditions.  This can be a poor decision when your server is dealing with both lock contention and high connections. When pt-stalk triggers, it triggers multiple simultaneous connections to MySQL to get the full processlist, lock waits, innodb transactions, slave status and other attributes.  Pt-stalk has the concept of sleeping a number of seconds after triggering, but once that time expires, the trigger may fire again, compounding the issue.  Put simply, pt-stalk can absorb the last few remaining connections on your database, particularly if you use extra_port (and run pt-stalk on the extra_port) and have a relatively low number of extra_max_connections.

Advice:  Stick to using only one of the built-in functions (like “processlist”) if it triggering when your processlist is large is enough for you.  Alternatively, write your own trg_plugin() function encompassing multiple tests that are relevant to your environment, if you need more than one check.

Unfortunately I cannot share the one I just wrote at this time (will need to write a more generic one to share later).  It checks processlist length, replication lag, innodb_trx wait time, and innodb_lock_waits, so that I could fold four of our more relevant checks into 1 pt-stalk and avoid the “connection stack_up” when MySQL was having an issue and mutiple stalks were firing.

Three client error messages that MySQL DBAs should know by heart to save time

Working in an operations environment means that you get a lot of questions.  There’s the inevitable troubleshooting tasks that go along with being a DBA.  This is designed to be a quick reference post, much like my more in depth post in 2016 about timeouts. These typical error messages can create confusion and unneeded activities to diagnose.  To aid my own process of elimination and those of others, here are three error messages every DBA should know by heart.

SQLSTATE[HY000] [2002] Connection timed out

Plain and simple: this error means the client cannot connect to the server.

  • The calling program is trying to connect to the wrong database server (one that it cannot reach).
  • The database server is completely down (you’d get another error if the server and client actually made a connection).
  • Firewall rules on either local servers or on a network resource is preventing the connection.
  • Routes are misconfigured.
  • The calling server has so many outbound connections that the MySQL client cannot create an outbound socket (think LSOF / netstat (lots of TIME_WAITed connections) ulimits / tcp_tw_reuse & recycle, etc).
  • Anything else that might be preventing an actual TCP connection outbound to the server port (typically 3306).

SQLSTATE[HY000] [2006] MySQL server has gone away

This error is typically related to server timeouts.

  • an idle connection is hitting wait_timeout.  wait_timeout concerns only idle sessions.  sessions with running queries are not affected by this timeout.
  • a running SELECT query with streaming cursor has not moved the cursor in [net_write_timeout] seconds
  • a running bulk INSERT or UPDATE transaction has not written anything to the server within [net_read_timeout] seconds
  • in rare cases interactive_timeout is being hit by an ad-hoc CLI client or other interactive program that respects the CLIENT_INTERACTIVE option to mysql_real_connect().  This is rare.  Most of the time interactive_timeout = wait_timeout.

Keep in mind that if a DBA has mindfully set timeouts on the server that adjustments should be made to SESSION timeout variables within the calling program.  In other words, don’t change server timeout settings just for one outlier or for some exceptional situation.

Also: This error can be “normal” in situations where the application opens a connection pool to the database and many connections in the pool are idle, hitting the wait timeout.  In other words, in environments like this, SQLSTATE[HY000] [2006] MySQL server has gone away can be a “non issue”.  Work with developers to understand where in the code the error appears and collaborate on the correct solution, which may be to ensure the application’s MySQL connector or library is configured to reconnect.

SQLSTATE[HY000] [2013] Lost connection to MySQL server during query

This error is exactly what it says.

  • A mysql query was interrupted in the middle of the query, meaning it was
    • killed by a backup utility on the server
    • killed by pt-kill on the server,
    • killed manually by a DBA on the server
    • or some other situation by which the client thread lost the connection to the server (network issue between client and server).
  • This error normally does not concern timeouts or settings on the server side.


A practical explanation: problems during unicode collation conversion


Recently I have been involved in an effort to convert MySQL databases from a utf8 character set to utf8mb4. As a part of this effort, my team evaluated which collations would be best for facilitating a broad range of multi-lingual support.

There have been many recent posts in the MySQL community about better unicode collation support in MySQL 8 such as from the MySQL Server Team’s blog at Oracle, who have also done a good job of showing us how newer collations based on UTF8 9.0.0 will properly group and sort characters according to their case and inheritance. As the title of the latter post suggests, the “devil is” indeed “in the details”.

There is also the matter of the “sushi-beer” problem, which shows that utf8mb4_unicode_520_ci will treat beer and sushi emoji as equal.

Rather than focusing on the particular deficiencies and enhancements in collations, this post focuses on practical solutions to converting existing data sets.  Every DBA, at some point, faces the daunting task of character set conversion or differing collation usage.  However, before we go on, a note.

Green field projects

When you are considering a new MySQL database to support a new project, common wisdom derived from articles linked above and from the many articles one may find on Stack Overflow and other resources suggests that if you have multi-lingual and emoji storage requirements: Just use utf8mb4 character set and utf8mb4_unicode_520_ci collation in MySQL.  That is, until MySQL 8.0 goes GA.  If you really know what you are doing, or if you already know a lot about this subject, your choices may vary.  The salient point here is that using an appropriate unicode collation (rather than defaults) will save the DBA from several future headaches, particularly regarding unique keys.

Existing data sets

Most DBAs work in shops with existing data sets with growing requirements.  Many DBAs will have already worked on converting latin1 databases to utf8.  Most MySQL installations that use utf8 will have utf8_general_ci as the collation.  For utf8mb4, the current default is utf8mb4_general_ci.

As illustrated in many documents and talks you’ll find that the general_ci collations in MySQL are sub-optimal when it comes to sorting and enforcing uniqueness.  They are not so bad with latin characters.  “APPLE = ApPlE = apple”, but when you’re talking about multi-lingual support, the general collations are generally bad at case insensitivity and allow variations such as the “apple” example in other languages to be distinguished as different.

To help illustrate a practical approach, I will provide an illustration of potential issues that one may encounter, and an example of how to potentially find and fix all unique key issues in a data set.

Creating an Example Data Set

To facilitate this example, I created a sample table in a MySQL 5.7 database:

CREATE TABLE `utf8_test` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `singlechar` varchar(4) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `unk1` (`singlechar`)

I chose varchar(4) because a lot of people use varchars, and I chose length 4, because some “emoji” and other 4 byte strings generated on the console give a MySQL error 1300 “Invalid utf8 character string: ‘\the\Code'”.  If you have a column with 4 bytes, then MySQL lets you insert it anyway, with a warning, rather than an error.

Then, I populated this with fake data, spanning all sorts of characters using a fairly “ugly” but very “quick” solution.

jscott@jscott-57-1:~$ cat /dev/urandom > myfile
jscott@jscott-57-1:~$ for x in `cat myfile | tr -d "[:cntrl:]" | grep -a -o .`; do mysql -v js -e "set collation_connection=utf8mb4_general_ci; insert into utf8_test (singlechar) values ('$x');"; done

Concatenating /dev/urandom to a file for a few seconds yielded about 35MB of “garbage”, containing many conceivable utf8 characters.  We loop through the file output with bash.  On Linux, tr does a bad job of actually breaking up the characters because it does not yet support multi-byte characters, but it does an excellent job of removing “control characters“, hence “-d [:cntrl:]”. grep is doing our heavy lifting for breaking up the strings (-a option to treat binary as text and the -o with a period as the argument to break up the string character by character).  Finally, we feed those characters into single insert statements to our table.  As you might imagine, this is quite nasty in terms of error messages, but it gets the job done.  In a few minutes time, we have a table with an integer id, and a couple thousand “supposedly unique” rows.

Unique key violations

For most DBAs, finding problems with collations starts with your first unique key error.  Let’s try converting our newly created table to use the utf8mb4_unicode_520_ci collation (default is utf8mb4_general_ci):

mysql> alter table utf8_test convert to character set utf8mb4 collate utf8mb4_unicode_520_ci;
ERROR 1062 (23000): Duplicate entry '9' for key 'unk1'

This tells us we have a violation.  But what then?  Inevitably, you solve one, then you find another, then another, then another.

Using concat() to find problems

You may knock MySQL for having inadequate collation support, but it has outstanding support for querying data in various ways.  For one thing, “collate” can be supplied to various parts of a query.  I like to use concat for these cases for reasons which will become clear later, but here’s a query that works well to find all the collisions in our sample table:

select count(1) as cnt, group_concat(id), concat(singlechar collate utf8mb4_unicode_ci) as unk from utf8_test group by unk having cnt > 1;

Notice that inside the concat() we are adding collate utf8mb4_unicode_520_ci. This seeks to emulate what MySQL is doing when trying to alter table (supplying a new table collation), but giving us all the information ahead of time.

Sample output:

| cnt | group_concat(id)              | unk  |
|   2 | 642,8804                      | ΄    |
|   2 | 1242,20448                    | ΅    |
|   2 | 194,11764                     | ;    |
|   2 | 16145,29152                   | ·    |
|   2 | 114105,33                     | ︵   |
|   2 | 63,186608                     | }    |
|   2 | 4963,44554                    | ʹ    |
|   2 | 84,87616                      | >    |
|   4 | 120845,292,2759,38412         | ୦    |
|   4 | 2,21162,25295,47504           | 1    |
|   5 | 46179,81143,231,7766,36158    | ²    |
|   4 | 66,2339,19777,26796           | 3    |
|   5 | 102802,158554,150,16224,21282 | ፬    |
|   3 | 35,14643,19433                | 5    |
|   3 | 107,377,9234                  | 6    |
|   4 | 12,585,12643,28853            | ٧    |
|   3 | 60,12070,25619                | 8    |
|   3 | 17,70,27677                   | ٩    |
|   3 | 32,4370,12498                 | A    |

Looking at one of these by itself:

mysql> select * from utf8_test where id in (102802,158554,150,16224,21282);
| id     | singlechar |
|    150 | 4          |
|  16224 | ٤          |
|  21282 | ۴          |
| 102802 | ፬          |
| 158554 | ៤          |
5 rows in set (0.00 sec)

All of the above characters are resolving to the one with ID 102802.

Possible actions:

In my current working environment, we decided to continue to use utf8mb4_general_ci collation, because we were unable to determine (in all cases) whether the duplicates in our data set were simply “character-case” related or whether we would actually experience data loss.

In the future, we expect to be able to “trust” the new collations in mysql 8 to be correct.

We went the extra mile to create a utility based on the following query.  The utility finds possible unique key violations using queries like the one I used above.

information_schema is your friend:
	case when group_concat(c.DATA_TYPE) like '%char%' then 1 else 0 end as contains_char
	information_schema.COLUMNS c
	tc.CONSTRAINT_SCHEMA = 'your_schema'

Using the query above and iterating through its results  you can build similar SQL statements to the one we used in our test scenario above to discover duplicates, supplying the “collate” in “concat” functions to the char columns.  concat() is a great fit, because it allows you to do the same query (

select count(1) as cnt, group_concat(id), concat(col1,'-',col2 collate <test collation>,'-'......) as unk from utf8_test group by unk having cnt > 1;

) for a unique constraint having varying numbers of columns.  An additional query to information schema in the loop is required to find which column has the char type.

You can then use the group concatenated IDs in the results to choose “winners and losers”, find dependent rows and update them, delete the losing rows, etc.