Advice on pt-stalk for connection saturated servers

Given an environment where a high volume web application is prone to opening many connections to backend resources, using a utility like pt-stalk is important.  When performance or availability affecting events like innodb lock waits or connection saturation occur, pt-stalk helps give you information you may need in troubleshooting what was happening.

The tendency may be to create multiple pt-stalks for various conditions.  This can be a poor decision when your server is dealing with both lock contention and high connections. When pt-stalk triggers, it triggers multiple simultaneous connections to MySQL to get the full processlist, lock waits, innodb transactions, slave status and other attributes.  Pt-stalk has the concept of sleeping a number of seconds after triggering, but once that time expires, the trigger may fire again, compounding the issue.  Put simply, pt-stalk can absorb the last few remaining connections on your database, particularly if you use extra_port (and run pt-stalk on the extra_port) and have a relatively low number of extra_max_connections.

Advice:  Stick to using only one of the built-in functions (like “processlist”) if it triggering when your processlist is large is enough for you.  Alternatively, write your own trg_plugin() function encompassing multiple tests that are relevant to your environment, if you need more than one check.

Unfortunately I cannot share the one I just wrote at this time (will need to write a more generic one to share later).  It checks processlist length, replication lag, innodb_trx wait time, and innodb_lock_waits, so that I could fold four of our more relevant checks into 1 pt-stalk and avoid the “connection stack_up” when MySQL was having an issue and mutiple stalks were firing.