Author: jscott

The witching hour: How illegal datetimes get stored into MySQL in US timezones

Most companies make a mistake at the very beginning.  They set their system time zone to Eastern, Central, Mountain, or Pacific time.  Then, they install their database on it.  They use all manner of timestamp and date types and insert terabytes of data.  After that, it becomes almost impossible (in terms of being worth the effort) to change it some months or years later when the company has a popular product or service.  One of the unfortunate issues that may arise is illegal datetimes.

On Sunday, March 11, 2018, when US time zones switched To daylight savings time, there was no 2AM hour.  When we switch back to normal time on Sunday November 4,  there will be two 2AM hours.

Inevitably, out of many database servers and many different types of application servers, some data manages to get inserted into MySQL with an illegal datetime during the 2AM hour on the “spring forward” date.  Each year, developers ask the question about application errors arising from it.

Here is an illustration of how it happens.

Given the following table definition:

CREATE TABLE `ts_test` (
  `js_ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `js_dt` datetime NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Let’s now try to insert some data.

mysql> insert into ts_test (js_ts,js_dt) values ('2018-03-11 02:12:00','2018-03-11 02:12:00');
ERROR 1292 (22007): Incorrect datetime value: '2018-03-11 02:12:00' for column 'js_ts' at row 1

We got an error for the datetime column.  This is good right?  MySQL is protecting us from inserting a bad value.  So – if I’m running… let’s see here, what am I running.  MySQL 5.7.21 — there should be no issue, Right?  Maybe this was yesterday’s issue on a different version of MySQL?….

Not so fast.

What about your ORM?  Is your ORM giving any specific instructions to the database for inserts and updates to datetimes?  If so, maybe… something like THIS is happening.

mysql> set session time_zone='-5:00';
Query OK, 0 rows affected (0.00 sec)
mysql> insert into ts_test (js_ts,js_dt) values ('2018-03-11 02:12:00','2018-03-11 02:12:00');
Query OK, 1 row affected (0.00 sec)

mysql> select * from ts_test;
+---------------------+---------------------+
| js_ts               | js_dt               |
+---------------------+---------------------+
| 2018-03-11 02:12:00 | 2018-03-11 02:12:00 |
+---------------------+---------------------+
1 row in set (0.00 sec)

Hm.  So, if we set the session time zone to UTC -5, the insert succeeds.  That must mean that MySQL has made this “right”.  If I query again after disconnecting from this session and starting a fresh one in daylight savings time – what will happen?  Both timestamps will advance and show 3AM…. right?

mysql> select * from ts_test;
+---------------------+---------------------+
| js_ts               | js_dt               |
+---------------------+---------------------+
| 2018-03-11 03:12:00 | 2018-03-11 02:12:00 |
+---------------------+---------------------+
1 row in set (0.00 sec)

Wrong.

The timestamp column adjusted.  The datetime one did not.  I now have a situation that my application is not going to like.  My applications above this database will error on the js_dt column value because it’s an illegal datetime.

What if I open up a virtual machine, create the same table, set the system time back to just before the time change, establish a MySQL session to the database server, leave that connection open, and continue to insert rows as the time changes.

I ran:

~# date -s "11 Mar 2018 01:59:30";
> insert into ts_test values (now(),now()); # repeatedly

… and as we went over the time change, MySQL did the right thing with the now() function and the session as far as I could tell.

| 2018-03-11 01:59:55 | 2018-03-11 01:59:55 |
| 2018-03-11 01:59:57 | 2018-03-11 01:59:57 |
| 2018-03-11 01:59:58 | 2018-03-11 01:59:58 |
| 2018-03-11 01:59:59 | 2018-03-11 01:59:59 |
| 2018-03-11 03:00:00 | 2018-03-11 03:00:00 |
| 2018-03-11 03:00:01 | 2018-03-11 03:00:01 |
| 2018-03-11 03:00:02 | 2018-03-11 03:00:02 |
| 2018-03-11 03:00:03 | 2018-03-11 03:00:03 |
| 2018-03-11 03:00:04 | 2018-03-11 03:00:04 |

I did not have applications at hand to try a PHP session or a java session inserting data second by second as we passed over the time change.

Most of the time we see this in the real world, the datetime datapoints are only few seconds past 2AM.  It is my belief that ORMs and software in general can set a specific offset from UTC, set it in the database session, and fail to change it because of an operation taking a few seconds and completing after the time change.

Let’s recap.

Q: How do bad datetime values get into the database?

  • Because application ORMs and other session originators can explicitly set the session offset from UTC.
  • Because a session connected prior to the time change may still have the previous time zone setting.

Q: Why is this a problem for datetimes and *not* timestamps?

  • The underlying storage for timestamps is UTC, and times must be greater than the start of the epoch (1970-01-01 00:00:00) or 0 unixtime.  The DB converts timestamps to UTC, and that conversion works relative to the connected session time zone.
  • Datetimes are not converted back and forth.  What you put in is what you get back out.  MySQL does try to validate datetimes based on the current timezone (see previous error), but an override of session.time_zone to an explicit UTC offset will negate the comparison to the system time zone; thus allowing “illegal” values to be inserted.

Q: What do I do if I have illegal datetimes in my database?

  • Illegal datetimes should only occur during 1 hour per year, therefore it should be easy to query to look for and report on them.  They should be updated + 1 hour.

When “log sequence number X is in the future” is a non-issue

While seeding a bunch of new slaves for an infrastructure (and after I had successfully seeded and started replication on several, and I can verify that the database is consistent), one of my fellow DBAs checks the mysql error log and says “INNODB CORRUPTION!!”.

All of my servers were spewing these errors.

2018-05-09T10:29:10.420943-05:00 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html for information about forcing recovery.

2018-05-09T10:29:10.421717-05:00 0 [ERROR] InnoDB: Page [page id: space=XXXX, page number=XXXXXXXX] log sequence number XXXXXXXXXXX is in the future! Current system log sequence number XXXXXXXXX.

But — No.  I do not have InnoDB corruption in this case.  I discovered a brand new gripe with MySQL & InnoDB.  This is not a ‘bug’ but more of an ‘unfortunate feature’ and I invite others to tell me that I am wrong about this.  How to reproduce:

  • Configure MySQL so that redo logs are in a separate partition from the datadir
  • Stop mysql, normally, cleanly
  • Back up and transfer and restore the datadir onto a new slave database system by whatever binary method you prefer. (in my case it was to create a zfs snapshot on the datadir partition and start MySQL back up and send that snapshot to a new slave).
  • The new slave is configured identically.  It:
    • has a fresh MySQL install.
    • has been started up, previously.
    • has been shut down, cleanly.
    • has redo logs in a separate partition (corresponding to the fresh installation).

If you are a MySQL DBA you now know what I did.  I have a clean/consistent MySQL data directory from one server paired with redo logs from a brand new server.  The brand new server’s MySQL was shut down cleanly, too, though, so nothing was replayed.  Because the innodb_log_file_size is the same on both servers, MySQL started up.  But, it also decided to adopt the system LSN from the innodb log file.  Thus we have a nasty issue of finding LSNs in the future in the tablespace for every transaction.  We’re now spewing fountains of MySQL error logs.

In this case – this is not a “real issue”.  There is no corruption.  MySQL/ InnoDB is just spewing errors.

To keep this from occurring, all I have to do (before starting the seeded slave the first time) is delete those ib_logfile*s on the destination redo log partition and let MySQL create them on start.  You only get one chance to do it right.  When MySQL creates the redo logs fresh, they contain the proper system LSN, obviously copied back out from ibdata1.  If you try nuking the log files and letting them recreate again AFTER the first start, it’s too late.  The system LSN is now in ibdata1 and a “too low” value will still be used.

There is a viable fix without doing “crazy stuff”.  I can shut down the seed source machine and grab the ib_logfiles, copy them to the new slave (after clean shutdown) and then start the new slave back up again.  Voila.  LSNs not in the future anymore.

My opinion is:

  • MySQL should not start up if the log LSN differs more than <threshold> from the one in ibdata1 – so that you can get an opportunity to delete redo logs and let them be recreated.  Why not? If you know your MySQL snapshot was created after a clean shutdown, it’s fine.
  • OR if the redo logs don’t contain any checkpoints that need replaying against the server, the system LSN from ibdata1 should be used (like it does when it starts without any redo logs).

Anyway I saw many posts on the web about this.  Your database is NOT always corrupt when you see this error message in your logs.

You don’t need to use some GDB hack with a perfectly quiet database.  You can use some of the hacks out there for writing enough data to advance the LSN if you want though.

BUT one solution that I couldn’t find anywhere is this:

If you are seeding machines and your seed source is still viable – just shut down MySQL on that seed source and grab its redo logs.  Shut down the destination cleanly as well and put the ib_logfiles from the source into place on the destination and restart.  Voila, the errors disappear.

You do have the choice of just “living with” the errors… and knowing that they are complete BS.  But that leaves the unfortunate discovery by someone who doesn’t know what was done and the panic starts all over again.

End of gripe.

Three client error messages that MySQL DBAs should know by heart to save time

Working in an operations environment means that you get a lot of questions.  There’s the inevitable troubleshooting tasks that go along with being a DBA.  This is designed to be a quick reference post, much like my more in depth post in 2016 about timeouts. These typical error messages can create confusion and unneeded activities to diagnose.  To aid my own process of elimination and those of others, here are three error messages every DBA should know by heart.

SQLSTATE[HY000] [2002] Connection timed out

Plain and simple: this error means the client cannot connect to the server.

  • The calling program is trying to connect to the wrong database server (one that it cannot reach).
  • The database server is completely down (you’d get another error if the server and client actually made a connection).
  • Firewall rules on either local servers or on a network resource is preventing the connection.
  • Routes are misconfigured.
  • The calling server has so many outbound connections that the MySQL client cannot create an outbound socket (think LSOF / netstat (lots of TIME_WAITed connections) ulimits / tcp_tw_reuse & recycle, etc).
  • Anything else that might be preventing an actual TCP connection outbound to the server port (typically 3306).

SQLSTATE[HY000] [2006] MySQL server has gone away

This error is typically related to server timeouts.

  • an idle connection is hitting wait_timeout.  wait_timeout concerns only idle sessions.  sessions with running queries are not affected by this timeout.
  • a running SELECT query with streaming cursor has not moved the cursor in [net_write_timeout] seconds
  • a running bulk INSERT or UPDATE transaction has not written anything to the server within [net_read_timeout] seconds
  • in rare cases interactive_timeout is being hit by an ad-hoc CLI client or other interactive program that respects the CLIENT_INTERACTIVE option to mysql_real_connect().  This is rare.  Most of the time interactive_timeout = wait_timeout.

Keep in mind that if a DBA has mindfully set timeouts on the server that adjustments should be made to SESSION timeout variables within the calling program.  In other words, don’t change server timeout settings just for one outlier or for some exceptional situation.

Also: This error can be “normal” in situations where the application opens a connection pool to the database and many connections in the pool are idle, hitting the wait timeout.  In other words, in environments like this, SQLSTATE[HY000] [2006] MySQL server has gone away can be a “non issue”.  Work with developers to understand where in the code the error appears and collaborate on the correct solution, which may be to ensure the application’s MySQL connector or library is configured to reconnect.

SQLSTATE[HY000] [2013] Lost connection to MySQL server during query

This error is exactly what it says.

  • A mysql query was interrupted in the middle of the query, meaning it was
    • killed by a backup utility on the server
    • killed by pt-kill on the server,
    • killed manually by a DBA on the server
    • or some other situation by which the client thread lost the connection to the server (network issue between client and server).
  • This error normally does not concern timeouts or settings on the server side.

 

Two version upgrade fun with MySQL 5.5 to 5.7

In a perfect world, one would upgrade databases one version at a time and not let them get too old.  But our databases are where the “crown jewels” are.  They must stay up 24×7.  When performance is acceptable, it’s acceptable, and sometimes old versions stay around too long.  We don’t live in a perfect world.  This idea applies to so many things.  There’s almost never a perfect data model.  There is always some type of resource constraint be it storage, memory, CPU, IOPS, or just plain dollars.

I will bring this concept of not living in a perfect world into a discussion about upgrades.

Ideally there would be…

  • …time to do two upgrades.  One upgrade to 5.6, the other to 5.7.   This is the way sane, normal people upgrade.
  • …a lot of extra hardware.  It sure would be nice to maybe combine a maintenance like this with a hardware refresh so that we could just have an entirely new MySQL 5.7 database shard, slap a MySQL 5.6 box in the middle as a relay, and once things are all caught up just do a VIP cutover to new machines.

The extra hardware idea I have actually worked with in the past.  A client of a company I worked for just had a standalone 5.5 machine with no replicants.  They were able to allocate two more servers, one to be a 5.6 relay and one new 5.7 machine that would serve as a final destination.  Added bonus… At the end the 5.6 server would be upgraded and left in place for a redundant machine.

In the absence of extra time and extra money / hardware resources, sometimes a little creativity is required, which brings me to an example of the most interesting 5.5 to 5.7 upgrade process I have been a part of.  However, it’s not “that creative”, but it does involve being willing to accept single points of failure for short periods of time.

The beginning setup is a MySQL 5.5 master-master pair in the primary site (machines 1&2 in 1st diagram below).  However when I say “master-master” pair, machine 1 is getting all the traffic.  Machine 3 is a secondary site master and Machine 4 is the secondary site slave.  And a side note: I am intentionally keeping this high level, so there are more steps involved than described here.

We upgrade machine 4 to be a 5.6 server and allow its slave to catch up.  We use a method of binary backup and restore to backup and restore machine 4 to machines 2 and 3.  Machines 2 and 3 become MySQL 5.7 servers.  After this is done the replication chain looks like this.

Then, we cutover production traffic to machine two, then use a binary backup/restore method to restore a copy of machine 3 to machines 1 and 4.  In the end we look just like we started, except now we’re all upgraded to MySQL 5.7.

What are some other interesting upgrade paths?  This is a fun topic that I enjoy hearing about in the community.  Cheers.