Meta engineers have asked the ITU to end the practice of adding “leap seconds” to keep atomic clocks in sync with the rotation of the Earth.
Leap seconds are occasionally added to correct universal time (UTC), just as leap years include an extra day to keep our calendars in sync with the Earth’s motion around the sun. Unfortunately, they create anomalous time stamps which have caused major network outages in the past. Meta, the owner of Facebook, has issued a strongly-worded blog post arguing that leap seconds are a “risky practice which does more harm than good,” and urging the International Telecommunications Union (ITU) to put an end to it.
The ITU is due to make a decision in 2023, to be enacted by the International Earth Rotation and Reference Systems Service (IERS).
Look before we leap
Most networks and computers use UTC, but the Earth’s rotation is very slightly irregular, and slows down imperceptibly over time. In 1972, the first leap second was introduced to keep official clocks in line with the Earth’s rotation, and there have been 27 leap seconds in total since then.
In this century, leap seconds have become more of a problem, because networks and other aspects of human existence increasingly rely on UTC. Software usually assumes that time will always move forward, and if this does not happen, the software can crash.
In 2012, a leap second caused a major Facebook outage, as Facebook’s Linux servers became overloaded trying to work out why they had been transported one second into the past.
In 2016, a similar thing happened to Cloudflare, when a leap second at midnight on December 31 extended 2016 by a single tick of the clock.
Now, Facebook wants to call a halt to the practice, because it is dangerous to systems relying on UTC. Instead, it wants to allow clocks to slow down fractionally for a short period, so the extra second can be “smeared” across most of a day.
“While the leap second might have been an acceptable solution in 1972, when it made both the scientific community and the telecom industry happy, these days UTC is equally bad for both digital applications and scientists, who often choose TAI or UT1 instead,” says a Meta engineering blog dated yesterday. TAI is the precise global atomic time standard, while UT1 is the more imprecise observed solar time.
“Introducing new leap seconds is a risky practice that does more harm than good, and we believe it is time to introduce new technologies to replace it,” says the anonymous blog. “As engineers at Meta, we are supporting a larger community push to stop the future introduction of leap seconds and remain at the current level of 27, which we believe will be enough for the next millennium.”
One easy answer would be for all systems to continue to use TAI, and simply apply a conversion factor (the cumulative number of leap seconds) to convert to UTC whenever talking to humans. According to Wikipedia, the TV industry, electric grids, and Bluetooth mesh networks have settled on this practice.
An alternative is to apply the leap second, but to “smear” it, breaking it down into a large number of smaller steps – effectively slowing down system clock by a tiny fraction for a period, until they have lost a whole second.
Meta has taken this approach to handle the two leap seconds since Facebook was tripped up in 2012: “There is no universal way to do this, but at Meta we smear the leap second throughout 17 hours, starting at 00:00:00 UTC based on the time zone data (tzdata) package content.”
Smearing over 17 hours makes the process more reliable, says Meta, because if Facebook’s network time (NTP) servers are brought into line gradually with tiny steps, none of them ever register as faulty compared to other servers. However, this approach requires “nontrivial conversion logic” inside timing systems, such as Facebook’s Time Appliance.
In fact, smeared time is also used at other web giants, including Amazon and Google, who both do leap smearing by different methods. This has raised the unfortunate prospect of there being multiple time smearing standards.
However, Google has got its standard out to the public some years back. In late 2016, before the leap second which took Cloudflare down, Google offered its internal smeared NTP time service as a public smeared time service.
It will be interesting to see if Meta adopts Google’s smeared time. Data Center Dynamics