Author: jscott

Easy fix: Error 1048 (23000) and explicit_defaults_for_timestamp

This may be a “duh” post for some, but I had to post this because I didn’t find the answer in typical places like stackoverflow when I had the issue.  I recently worked on a project to expand database capacity by deploying new MySQL installations with memory, config, and disk space tweaks by backup/restore, replication topology change and, and failover.  I did not notice that the old servers had “explicit_defaults_for_timestamp=OFF”.  After restoring a binary backup and starting the replication thread on the new systems I got this error in the replication thread (column name in error corresponds to examples further down).

ERROR 1048 (23000): Column 'ts' cannot be null

Below, I will provide a synopsis to show statements that caused the error and why a simple global variable change fixed the issue.  First, a sample table definition.

CREATE TABLE `time_test` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `comment` varchar(32) NOT NULL,
  `ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=latin1

Notice, the timestamp has a default of current_timestamp.  Next, a statement that works fine.

mysql> insert into time_test (comment) values ('this is statement 1');

“id” and “ts” columns are omitted to allow MySQL to generate both the ID and the timestamp value.

Next – a statement that does not work, similar to the one I encountered in my replication thread.

mysql> insert into time_test (comment,ts) values ('this is statement 2',NULL);
ERROR 1048 (23000): Column 'ts' cannot be null

With MySQL and many other databases, even though we have the “DEFAULT CURRENT_TIMESTAMP” on our timestamp column, explicitly supplying NULL in the INSERT statement is “non-standard” syntax.  Standard syntax would be to omit the column in the column spec and VALUES (e.g. query 1).  We get the error because of the default value of explicit_defaults_for_timestamp.  From the MySQL manual, “This system variable determines whether the server enables certain nonstandard behaviors for default values and NULL-value handling in TIMESTAMP columns. By default, explicit_defaults_for_timestamp is enabled, which disables the nonstandard behaviors.”

I was working with an application with deprecated SQL syntax.  In fact, this syntax may be legal on some modern databases; just not by default on MySQL.  If a reader knows of databases where this is legal, please feel free to comment.

The fix for this is as simple as it gets, which is probably why there are not more posts about this.

mysql> set global explicit_defaults_for_timestamp='OFF';
Query OK, 0 rows affected (0.00 sec)

Because this is a global variable, one must log out and log back into the CLI.  After reconnecting:

mysql> insert into time_test (comment,ts) values ('this is statement 2',NULL);
Query OK, 1 row affected (0.01 sec)

mysql> select * from time_test;
+----+---------------------+---------------------+
| id | comment             | ts                  |
+----+---------------------+---------------------+
|  1 | this is statement 1 | 2018-08-14 02:51:21 |
|  2 | this is statement 2 | 2018-08-14 02:52:20 |
+----+---------------------+---------------------+
2 rows in set (0.00 sec)

In my case, as soon as the global variable was set, I was able to start my replication thread back up.  Thanks for reading my reference post.

Query Macroeconomics

I studied some macroeconomics in school.  I’m still interested in it 20 years hence.  I was recently in a discussion about query optimization and how to prioritize what to fix first.  My pen and paper started graphing things, and here we are with an abstract thought.  Bear with me.  This is for entertainment purposes, mostly, but may actually have a small amount of value in your thought processes around optimizing queries.  This is a riff on various supply, demand graphs from macroeconomics.

In the graph below:

  • Axes:
    • Vertical: number of distinct problem queries
    • Horizontal: Database “query load capacity” gains (from optimization)
  • Lines:
    • LIRQ (long and/or infrequently run queries)
    • SFRQ (short, frequently run queries)
    • AC: Absolute capacity (the point at which you’re going as fast as I/O platform you run on will let you and your query capacity bottlenecks have less to do with queries and more to do with not enough IOPS).
  • Point:
    • E (subscript) O: Equilibrium of optimization

On LIRQ: Simply put, on a typical OLTP workload, you may have several long and infrequently running queries on the database that are “problems” for overall system performance.  If you optimize those queries, your performance gain in load capacity is sometimes fairly small.

ON SFRQ: Conversely, optimizing short running but very very frequently run “problem queries” can sometimes create very large gains in query load capacity.  Example: a covering index that takes a query that’s run many thousands of times a minute from 10 milliseconds down to < 1 millisecond by ensuring the data is in the bufferpool can give you some serious horsepower back.

On AC: Working on optimizing often run queries that are not creating an I/O logjam do not return any benefits.  You can only go as fast as your platform will let you, so if you are getting close to the point where your database is so well optimized that you really can’t read or write to disk any faster, then you have hit the wall and you will produce little ROI with your optimization efforts unless you make the platform faster (which moves the red line to the right).

On EO:  Often run long (or somewhat long) queries are the low hanging fruit.  They should stand out naturally and be a real “apparent pain” in the processlist or in application response times without even bothering to pt-query-digest.

Speaking of pt-query-digest: digests of the slow query log (when log_query_time is set to 0) are a good way to figure out what types of queries are taking up the lion’s share of your database load. You will be able to tell via the ranking and the total time and percentiles shown in the digest what queries are taking up your database’s valuable time.  I wish for you that you have SFRQ, so that your optimization effort may produce high rewards in capacity gain.

Thanks for bearing with me on my database capacity economics.

 

The witching hour: How illegal datetimes get stored into MySQL in US timezones

Most companies make a mistake at the very beginning.  They set their system time zone to Eastern, Central, Mountain, or Pacific time.  Then, they install their database on it.  They use all manner of timestamp and date types and insert terabytes of data.  After that, it becomes almost impossible (in terms of being worth the effort) to change it some months or years later when the company has a popular product or service.  One of the unfortunate issues that may arise is illegal datetimes.

On Sunday, March 11, 2018, when US time zones switched To daylight savings time, there was no 2AM hour.  When we switch back to normal time on Sunday November 4,  there will be two 2AM hours.

Inevitably, out of many database servers and many different types of application servers, some data manages to get inserted into MySQL with an illegal datetime during the 2AM hour on the “spring forward” date.  Each year, developers ask the question about application errors arising from it.

Here is an illustration of how it happens.

Given the following table definition:

CREATE TABLE `ts_test` (
  `js_ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `js_dt` datetime NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Let’s now try to insert some data.

mysql> insert into ts_test (js_ts,js_dt) values ('2018-03-11 02:12:00','2018-03-11 02:12:00');
ERROR 1292 (22007): Incorrect datetime value: '2018-03-11 02:12:00' for column 'js_ts' at row 1

We got an error for the datetime column.  This is good right?  MySQL is protecting us from inserting a bad value.  So – if I’m running… let’s see here, what am I running.  MySQL 5.7.21 — there should be no issue, Right?  Maybe this was yesterday’s issue on a different version of MySQL?….

Not so fast.

What about your ORM?  Is your ORM giving any specific instructions to the database for inserts and updates to datetimes?  If so, maybe… something like THIS is happening.

mysql> set session time_zone='-5:00';
Query OK, 0 rows affected (0.00 sec)
mysql> insert into ts_test (js_ts,js_dt) values ('2018-03-11 02:12:00','2018-03-11 02:12:00');
Query OK, 1 row affected (0.00 sec)

mysql> select * from ts_test;
+---------------------+---------------------+
| js_ts               | js_dt               |
+---------------------+---------------------+
| 2018-03-11 02:12:00 | 2018-03-11 02:12:00 |
+---------------------+---------------------+
1 row in set (0.00 sec)

Hm.  So, if we set the session time zone to UTC -5, the insert succeeds.  That must mean that MySQL has made this “right”.  If I query again after disconnecting from this session and starting a fresh one in daylight savings time – what will happen?  Both timestamps will advance and show 3AM…. right?

mysql> select * from ts_test;
+---------------------+---------------------+
| js_ts               | js_dt               |
+---------------------+---------------------+
| 2018-03-11 03:12:00 | 2018-03-11 02:12:00 |
+---------------------+---------------------+
1 row in set (0.00 sec)

Wrong.

The timestamp column adjusted.  The datetime one did not.  I now have a situation that my application is not going to like.  My applications above this database will error on the js_dt column value because it’s an illegal datetime.

What if I open up a virtual machine, create the same table, set the system time back to just before the time change, establish a MySQL session to the database server, leave that connection open, and continue to insert rows as the time changes.

I ran:

~# date -s "11 Mar 2018 01:59:30";
> insert into ts_test values (now(),now()); # repeatedly

… and as we went over the time change, MySQL did the right thing with the now() function and the session as far as I could tell.

| 2018-03-11 01:59:55 | 2018-03-11 01:59:55 |
| 2018-03-11 01:59:57 | 2018-03-11 01:59:57 |
| 2018-03-11 01:59:58 | 2018-03-11 01:59:58 |
| 2018-03-11 01:59:59 | 2018-03-11 01:59:59 |
| 2018-03-11 03:00:00 | 2018-03-11 03:00:00 |
| 2018-03-11 03:00:01 | 2018-03-11 03:00:01 |
| 2018-03-11 03:00:02 | 2018-03-11 03:00:02 |
| 2018-03-11 03:00:03 | 2018-03-11 03:00:03 |
| 2018-03-11 03:00:04 | 2018-03-11 03:00:04 |

I did not have applications at hand to try a PHP session or a java session inserting data second by second as we passed over the time change.

Most of the time we see this in the real world, the datetime datapoints are only few seconds past 2AM.  It is my belief that ORMs and software in general can set a specific offset from UTC, set it in the database session, and fail to change it because of an operation taking a few seconds and completing after the time change.

Let’s recap.

Q: How do bad datetime values get into the database?

  • Because application ORMs and other session originators can explicitly set the session offset from UTC.
  • Because a session connected prior to the time change may still have the previous time zone setting.

Q: Why is this a problem for datetimes and *not* timestamps?

  • The underlying storage for timestamps is UTC, and times must be greater than the start of the epoch (1970-01-01 00:00:00) or 0 unixtime.  The DB converts timestamps to UTC, and that conversion works relative to the connected session time zone.
  • Datetimes are not converted back and forth.  What you put in is what you get back out.  MySQL does try to validate datetimes based on the current timezone (see previous error), but an override of session.time_zone to an explicit UTC offset will negate the comparison to the system time zone; thus allowing “illegal” values to be inserted.

Q: What do I do if I have illegal datetimes in my database?

  • Illegal datetimes should only occur during 1 hour per year, therefore it should be easy to query to look for and report on them.  They should be updated + 1 hour.

When “log sequence number X is in the future” is a non-issue

While seeding a bunch of new slaves for an infrastructure (and after I had successfully seeded and started replication on several, and I can verify that the database is consistent), one of my fellow DBAs checks the mysql error log and says “INNODB CORRUPTION!!”.

All of my servers were spewing these errors.

2018-05-09T10:29:10.420943-05:00 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html for information about forcing recovery.

2018-05-09T10:29:10.421717-05:00 0 [ERROR] InnoDB: Page [page id: space=XXXX, page number=XXXXXXXX] log sequence number XXXXXXXXXXX is in the future! Current system log sequence number XXXXXXXXX.

But — No.  I do not have InnoDB corruption in this case.  I discovered a brand new gripe with MySQL & InnoDB.  This is not a ‘bug’ but more of an ‘unfortunate feature’ and I invite others to tell me that I am wrong about this.  How to reproduce:

  • Configure MySQL so that redo logs are in a separate partition from the datadir
  • Stop mysql, normally, cleanly
  • Back up and transfer and restore the datadir onto a new slave database system by whatever binary method you prefer. (in my case it was to create a zfs snapshot on the datadir partition and start MySQL back up and send that snapshot to a new slave).
  • The new slave is configured identically.  It:
    • has a fresh MySQL install.
    • has been started up, previously.
    • has been shut down, cleanly.
    • has redo logs in a separate partition (corresponding to the fresh installation).

If you are a MySQL DBA you now know what I did.  I have a clean/consistent MySQL data directory from one server paired with redo logs from a brand new server.  The brand new server’s MySQL was shut down cleanly, too, though, so nothing was replayed.  Because the innodb_log_file_size is the same on both servers, MySQL started up.  But, it also decided to adopt the system LSN from the innodb log file.  Thus we have a nasty issue of finding LSNs in the future in the tablespace for every transaction.  We’re now spewing fountains of MySQL error logs.

In this case – this is not a “real issue”.  There is no corruption.  MySQL/ InnoDB is just spewing errors.

To keep this from occurring, all I have to do (before starting the seeded slave the first time) is delete those ib_logfile*s on the destination redo log partition and let MySQL create them on start.  You only get one chance to do it right.  When MySQL creates the redo logs fresh, they contain the proper system LSN, obviously copied back out from ibdata1.  If you try nuking the log files and letting them recreate again AFTER the first start, it’s too late.  The system LSN is now in ibdata1 and a “too low” value will still be used.

There is a viable fix without doing “crazy stuff”.  I can shut down the seed source machine and grab the ib_logfiles, copy them to the new slave (after clean shutdown) and then start the new slave back up again.  Voila.  LSNs not in the future anymore.

My opinion is:

  • MySQL should not start up if the log LSN differs more than <threshold> from the one in ibdata1 – so that you can get an opportunity to delete redo logs and let them be recreated.  Why not? If you know your MySQL snapshot was created after a clean shutdown, it’s fine.
  • OR if the redo logs don’t contain any checkpoints that need replaying against the server, the system LSN from ibdata1 should be used (like it does when it starts without any redo logs).

Anyway I saw many posts on the web about this.  Your database is NOT always corrupt when you see this error message in your logs.

You don’t need to use some GDB hack with a perfectly quiet database.  You can use some of the hacks out there for writing enough data to advance the LSN if you want though.

BUT one solution that I couldn’t find anywhere is this:

If you are seeding machines and your seed source is still viable – just shut down MySQL on that seed source and grab its redo logs.  Shut down the destination cleanly as well and put the ib_logfiles from the source into place on the destination and restart.  Voila, the errors disappear.

You do have the choice of just “living with” the errors… and knowing that they are complete BS.  But that leaves the unfortunate discovery by someone who doesn’t know what was done and the panic starts all over again.

End of gripe.