Category: southern fried

Query Macroeconomics

I studied some macroeconomics in school.  I’m still interested in it 20 years hence.  I was recently in a discussion about query optimization and how to prioritize what to fix first.  My pen and paper started graphing things, and here we are with an abstract thought.  Bear with me.  This is for entertainment purposes, mostly, but may actually have a small amount of value in your thought processes around optimizing queries.  This is a riff on various supply, demand graphs from macroeconomics.

In the graph below:

  • Axes:
    • Vertical: number of distinct problem queries
    • Horizontal: Database “query load capacity” gains (from optimization)
  • Lines:
    • LIRQ (long and/or infrequently run queries)
    • SFRQ (short, frequently run queries)
    • AC: Absolute capacity (the point at which you’re going as fast as I/O platform you run on will let you and your query capacity bottlenecks have less to do with queries and more to do with not enough IOPS).
  • Point:
    • E (subscript) O: Equilibrium of optimization

On LIRQ: Simply put, on a typical OLTP workload, you may have several long and infrequently running queries on the database that are “problems” for overall system performance.  If you optimize those queries, your performance gain in load capacity is sometimes fairly small.

ON SFRQ: Conversely, optimizing short running but very very frequently run “problem queries” can sometimes create very large gains in query load capacity.  Example: a covering index that takes a query that’s run many thousands of times a minute from 10 milliseconds down to < 1 millisecond by ensuring the data is in the bufferpool can give you some serious horsepower back.

On AC: Working on optimizing often run queries that are not creating an I/O logjam do not return any benefits.  You can only go as fast as your platform will let you, so if you are getting close to the point where your database is so well optimized that you really can’t read or write to disk any faster, then you have hit the wall and you will produce little ROI with your optimization efforts unless you make the platform faster (which moves the red line to the right).

On EO:  Often run long (or somewhat long) queries are the low hanging fruit.  They should stand out naturally and be a real “apparent pain” in the processlist or in application response times without even bothering to pt-query-digest.

Speaking of pt-query-digest: digests of the slow query log (when log_query_time is set to 0) are a good way to figure out what types of queries are taking up the lion’s share of your database load. You will be able to tell via the ranking and the total time and percentiles shown in the digest what queries are taking up your database’s valuable time.  I wish for you that you have SFRQ, so that your optimization effort may produce high rewards in capacity gain.

Thanks for bearing with me on my database capacity economics.


southern fried: quick script to fix table auto increments

OK, Y’all.  If you have been a DBA long enough you have run into situations where you’ve had to import DDL from another database instance and forgotten to remove AUTO_INCREMENT on a table definition.  Sometimes this means your auto_increment is way up in the millions and you have 10 rows in the table.  People worried about ID depletion on an integer column will do a “facepalm”.

Making matters worse, having perhaps done this on more than one table, sometimes it’s a real pain to detect and fix the issue of auto_increments being much higher than you want them to be across a big number of tables.  Sometimes the pain is so great that you might want to simply repeat your data importation.

This quick and dirty “southern fried” script will help you report and rectify that.  It detects your auto_increment columns and checks them by querying “ORDER BY <auto_increment_column> desc limit 1”  I coded it in bash to be the “least objectionable”, so maybe some readers will actually use it.

So far the script is “lightly tested” so use at your own risk.