When “log sequence number X is in the future” is a non-issue

While seeding a bunch of new slaves for an infrastructure (and after I had successfully seeded and started replication on several, and I can verify that the database is consistent), one of my fellow DBAs checks the mysql error log and says “INNODB CORRUPTION!!”.

All of my servers were spewing these errors.

2018-05-09T10:29:10.420943-05:00 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html for information about forcing recovery.

2018-05-09T10:29:10.421717-05:00 0 [ERROR] InnoDB: Page [page id: space=XXXX, page number=XXXXXXXX] log sequence number XXXXXXXXXXX is in the future! Current system log sequence number XXXXXXXXX.

But — No.  I do not have InnoDB corruption in this case.  I discovered a brand new gripe with MySQL & InnoDB.  This is not a ‘bug’ but more of an ‘unfortunate feature’ and I invite others to tell me that I am wrong about this.  How to reproduce:

  • Configure MySQL so that redo logs are in a separate partition from the datadir
  • Stop mysql, normally, cleanly
  • Back up and transfer and restore the datadir onto a new slave database system by whatever binary method you prefer. (in my case it was to create a zfs snapshot on the datadir partition and start MySQL back up and send that snapshot to a new slave).
  • The new slave is configured identically.  It:
    • has a fresh MySQL install.
    • has been started up, previously.
    • has been shut down, cleanly.
    • has redo logs in a separate partition (corresponding to the fresh installation).

If you are a MySQL DBA you now know what I did.  I have a clean/consistent MySQL data directory from one server paired with redo logs from a brand new server.  The brand new server’s MySQL was shut down cleanly, too, though, so nothing was replayed.  Because the innodb_log_file_size is the same on both servers, MySQL started up.  But, it also decided to adopt the system LSN from the innodb log file.  Thus we have a nasty issue of finding LSNs in the future in the tablespace for every transaction.  We’re now spewing fountains of MySQL error logs.

In this case – this is not a “real issue”.  There is no corruption.  MySQL/ InnoDB is just spewing errors.

To keep this from occurring, all I have to do (before starting the seeded slave the first time) is delete those ib_logfile*s on the destination redo log partition and let MySQL create them on start.  You only get one chance to do it right.  When MySQL creates the redo logs fresh, they contain the proper system LSN, obviously copied back out from ibdata1.  If you try nuking the log files and letting them recreate again AFTER the first start, it’s too late.  The system LSN is now in ibdata1 and a “too low” value will still be used.

There is a viable fix without doing “crazy stuff”.  I can shut down the seed source machine and grab the ib_logfiles, copy them to the new slave (after clean shutdown) and then start the new slave back up again.  Voila.  LSNs not in the future anymore.

My opinion is:

  • MySQL should not start up if the log LSN differs more than <threshold> from the one in ibdata1 – so that you can get an opportunity to delete redo logs and let them be recreated.  Why not? If you know your MySQL snapshot was created after a clean shutdown, it’s fine.
  • OR if the redo logs don’t contain any checkpoints that need replaying against the server, the system LSN from ibdata1 should be used (like it does when it starts without any redo logs).

Anyway I saw many posts on the web about this.  Your database is NOT always corrupt when you see this error message in your logs.

You don’t need to use some GDB hack with a perfectly quiet database.  You can use some of the hacks out there for writing enough data to advance the LSN if you want though.

BUT one solution that I couldn’t find anywhere is this:

If you are seeding machines and your seed source is still viable – just shut down MySQL on that seed source and grab its redo logs.  Shut down the destination cleanly as well and put the ib_logfiles from the source into place on the destination and restart.  Voila, the errors disappear.

You do have the choice of just “living with” the errors… and knowing that they are complete BS.  But that leaves the unfortunate discovery by someone who doesn’t know what was done and the panic starts all over again.

End of gripe.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s