A GPSD time warp
LWN.net needs you! Without subscribers, LWN would simply not exist. Please consider signing up for a subscription and helping to keep LWN publishing |
The GPSD project provides a daemon for communicating with various GPS devices in order to retrieve the location information that those sensors provide. But the GPS satellites also provide highly accurate time information that GPSD can extract for use by Network Time Protocol (NTP) servers. A bug in the GPSD code will cause time to go backward in October, though, which may well cause some havoc if affected NTP servers do not get an update before then.
At some level, the root cause of the problem is the GPS week-number rollover that occurs because only ten bits were used to represent week numbers in the original GPS protocol. Ten bits overflows after 1023, so only 19.6 (and change) years can be represented. Since the GPS epoch starts at the beginning of 1980, there have already been two rollover events (in 1999 and 2019); there is not supposed to be another until 2038, but a bug in some sanity checking code in GPSD will cause it to subtract 1024 from the week number on October 24, 2021. The effect will be a return to March 2002, which is not what anyone wants—or expects.
The problem was reported by Stephen Williams on July 21. It affects GPSD versions 3.20‑3.22, which is all of the releases since the last day of 2019. The upcoming 3.23 release—due as soon as August 4—will fix the problem, but it needs to be installed on all of the relevant servers. There are concerns that if the word does not get out to NTP server administrators, there could be a rather unpleasant October surprise.
The code in question was quoted in the bug report. In the gpsd_gpstime_resolv() function, the wrong value for a constant is used:
/* sanity check week number, GPS epoch, against leap seconds * Does not work well with regressions because the leap_sconds * could be from the receiver, or from BUILD_LEAPSECONDS. */ if (0 < session->context->leap_seconds && 19 > session->context->leap_seconds && 2180 < week) { /* assume leap second = 19 by 31 Dec 2022 * so week > 2180 is way in the future, do not allow it */ week -= 1024; GPSD_LOG(LOG_WARN, &session->context->errout, "GPS week confusion. Adjusted week %u for leap %d\n", week, session->context->leap_seconds); }
The code may be a little hard to read with the comparisons in the reverse order in which they typically are written; perhaps it is Yoda notation, though it seems strange to apply it to non-equality comparisons. In any case, the week number, which is being calculated elsewhere with rollovers accounted for, is compared against 2180, which is not "way in the future" as stated in the comment, but corresponds to October 24 instead. The test was evidently meant to prevent some spurious regression-test failures, which is what the first comment is talking about.
GPSD maintainer Gary E. Miller acknowledged the problem, noting that he meant to use the week number for December 31, 2022 but made an error in calculating it, thus 2180. The code effectively also "predicts" another leap second being added by the end of 2022, but, as Williams pointed out, that may not be a valid assumption. Beyond that, it is possible that a negative leap second may be coming relatively soon, but the code is not written with that in mind.
Miller said
that up until 2020, "leap seconds had been very predicable
",
but that recent findings about an increase in the earth's rotational speed
have changed that—raising the possibility of a negative leap second.
The code in question was aimed at the regression tests, however, not the path
for handling live GPS messages, which was another part of the problem.
On July 24, Miller committed a fix that removed the errant test from the live path. But the fix will only appear in the 3.23 release; it will not be backported to previous releases—at least by the GPSD project. While distributions may do so, he is not convinced that it will make things better:
gpsd does not have enough volunteers to maintain "branches". Some distros try to cherry pick, but usually make things worse.This bug was announced on gpsd-dev and gpsd-users email lists. So the packagers for several distros already saw it. What they do is what they do.
3.23 will be released before a week has gone by.
[...] The fact that distros do not pick up gpsd updates, or upstream their patches, is a very sore spot with me.
Williams found the
bug in a fork of GPSD 3.19 that he is maintaining. Some changes
that were made for 3.20 were backported to that fork; testing
that he did on that code showed the problem. But Miller believes
that
distributions and others should be running more recent versions, and that
they should upgrade to 3.23 when it is available, because each new
release fixes security-related bugs. That is, of course, somewhat similar
to the position of
other projects, the Linux kernel in particular, as Miller noted: "I [am]
gonna fall back on Greg K_H's dictum: All users must update.
"
The question of problems with negative leap seconds was also discussed. With Miller's fix applied, there is no known problem of that sort, and even with the earlier (broken) code, a negative leap second would not have changed anything, Williams said. He just happened to notice that the code in question was not expecting the possibility of a negative leap second. No one has yet found any problem should a negative leap second occur, but it is something that could use more testing.
It seems rather short-sighted of the GPS protocol designers to "bake in" a
20-year rollover; as Miller put it:
"GPS, by design, is a 1024 week time warp waiting to happen.
"
The more recent CNAV protocol (which is not present in all GPS satellites
yet) upgrades the week number to 13 bits, which results in a plausibly
safer 157-year rollover, though the first overflow of that is only 116
years from now in 2137. It seems probable that there will be other navigation
(and time) technologies by then—or that another couple of bits can be squeezed
in somewhere.
The upshot is that anyone relying on GPSD for the correct time after mid-October will want to be running a version without this bug. The Time Warp is fun in movies, but it is rather less so for the systems that dole out time on the internet. NTP servers and the like that use GPSD must upgrade—or at least avoid versions 3.20‑3.22.
[Thanks to David A. Wheeler for giving us a heads-up about this issue.]
Index entries for this article | |
---|---|
Security | Network Time Protocol (NTP) |
(Log in to post comments)
A GPSD time warp
Posted Aug 4, 2021 20:34 UTC (Wed) by HenrikH (subscriber, #31152) [Link]
A GPSD time warp
Posted Aug 4, 2021 20:44 UTC (Wed) by mathstuf (subscriber, #69389) [Link]
> which is not present in all GPS satellites yet
Updating these things is not easy (I imagine), so "just update to the new protocol" is a logistical issue because these things aren't deployed and maintained using Kubernetes, Ansible, or whatever, but instead via actual rockets and high-latency radio communications. Plus, wouldn't a rollover counter just…be an extra two bits on the week counter?
For some more background on this issue, this post was enlightening for me: https://berthub.eu/articles/posts/leapseconds-expose-bugs...
A GPSD time warp
Posted Aug 4, 2021 23:50 UTC (Wed) by HenrikH (subscriber, #31152) [Link]
Now they probably (and perhaps correctly so) that adding 3 more bits just made the rollovers such a seldom occurrence that we no longer have to think about them, but I still feel that with a rollover counter the logic around the week rollover could be more robust in the receiver.
Neither change would of course fix this 1024 problem since that was just a miscalculation by the dev, there a proper 64-bit value for the week with zero rollover would have been better but then I guess that bandwidth is an issue here and the bits are severely limited.
A GPSD time warp
Posted Aug 5, 2021 6:23 UTC (Thu) by cpitrat (subscriber, #116459) [Link]
On week 1023:
001111111111
On week 1024:
010000000000
The fact that you decide to separate them or not is entirely up to the code that uses them.
A GPSD time warp
Posted Aug 6, 2021 18:18 UTC (Fri) by HenrikH (subscriber, #31152) [Link]
If the number can only increment then there is zero differences between a separate rollover counter and increasing the number of bits of the actual value. If they can decrease however there is a world of difference, with a separate rollover counter you can go from 110000000000 to 111111111111 and deduct that time decreased by one second and not that you had a packet loss for 1023 seconds.
A GPSD time warp
Posted Aug 5, 2021 10:23 UTC (Thu) by hthoma (subscriber, #4743) [Link]
There are only 52 weeks in a year. Why don't they start over every January 1st? Or is there no year in the GPS protocol?
A GPSD time warp
Posted Aug 5, 2021 10:48 UTC (Thu) by farnz (subscriber, #17727) [Link]
There is no year in the GPS protocol. There are two time fields: a week number, and a seconds since start of week number. Years, months, hours, minutes etc are all computed based on those two fields.
A GPSD time warp
Posted Aug 5, 2021 11:39 UTC (Thu) by excors (subscriber, #95769) [Link]
From the definitions in https://www.gps.gov/technical/icwg/IS-GPS-200M.pdf there's a 19-bit "time of week" (TOW) count, which is the number of "X1 epochs" (which are 1.5 seconds) since the start of the week, and a week number (where 0 is the first week in 1980).
The protocol is designed as a 1500-bit frame, split into 300-bit subframes, each with ten 30-bit words. It transmits at 50bps, so one frame takes 30 seconds and one subframe takes 6 seconds. One word in every subframe has the 17 MSBs of TOW (i.e. a resolution of 1.5<<2 = 6 seconds, which matches the subframe rate, so it also functions as an incrementing subframe counter).
One word in subframe 1 has the 10 LSBs of the week number. Since it's only sent once per frame, it sounds like it wouldn't have been too wasteful of bandwidth to add another few bits, but it may have been tricky to squeeze them in with all the other data that they wanted to send in the same subframe. I guess the designers figured that just wasn't worthwhile - there's no ambiguity if the GPS receiver has a <20-year lifespan (because you can hardcode the manufacturing date and assume all GPS weeks are in the 20 years after that), and if there is an ambiguity then the user could just tell the device what decade it's in, so it shouldn't have been a big problem. It should only become a problem with devices/software that survive >20 years, or where the designers failed to hardcode the manufacturing date, and where there's no convenient way for a user to specify the approximate date; but it turns out that's fairly common nowadays.
A GPSD time warp
Posted Aug 5, 2021 10:46 UTC (Thu) by farnz (subscriber, #17727) [Link]
How do you make the receiver more robust with a "rollover counter" instead of extra bits? Remember that GPS time is monotonic, and never stalls - the leap second offset is applied separately, and you can change the GPS leap second offset to hold "clock" time still while GPS time keeps advancing.
As a receiver implementer, if you see the week number go backwards, you've detected a rollover and need to handle it. The big problem that's unsolved in GPS week number rollovers is how to detect a rollover from cold; if the last week number you saw was 100 back in 2001, and you come up from cold seeing week number 150, how do you tell if this is 2002 or if you've been shut down for ages and it's now 2021?
Simply adding bits means that this isn't a problem for devices expected to last less than 100 years - you know your manufacture date to within 50 years, and thus have an unambiguous mapping from week number to year. Devices that might last more than 100 years need more care taken.
A GPSD time warp
Posted Aug 6, 2021 16:34 UTC (Fri) by HenrikH (subscriber, #31152) [Link]
A GPSD time warp
Posted Aug 5, 2021 6:20 UTC (Thu) by nhaehnle (subscriber, #114772) [Link]
What's the conceptual difference of your suggestion to what was actually done?
Isn't it the case that in any n-bit number, the top 2 bits count how often the low (n-2) bits roll over to zero? So what you're describing is equivalent to an extension of the week counter to 12 bits instead of 13 bits?
A GPSD time warp
Posted Aug 6, 2021 16:36 UTC (Fri) by HenrikH (subscriber, #31152) [Link]
A GPSD time warp
Posted Aug 5, 2021 10:13 UTC (Thu) by scientes (guest, #83068) [Link]
A GPSD time warp
Posted Aug 6, 2021 16:37 UTC (Fri) by HenrikH (subscriber, #31152) [Link]
A GPSD time warp
Posted Aug 6, 2021 18:25 UTC (Fri) by scientes (guest, #83068) [Link]
A GPSD time warp
Posted Aug 6, 2021 21:00 UTC (Fri) by HenrikH (subscriber, #31152) [Link]
A GPSD time warp
Posted Aug 4, 2021 21:30 UTC (Wed) by NYKevin (subscriber, #129325) [Link]
>
> Miller said that up until 2020, "leap seconds had been very predicable", but that recent findings about an increase in the earth's rotational speed have changed that—raising the possibility of a negative leap second. The code in question was aimed at the regression tests, however, not the path for handling live GPS messages, which was another part of the problem.
Miller is not wrong. A negative leap second would be unprecedented (which doesn't mean it won't happen). I anticipate that a lot of leap-second-sensitive technologies will break on it. Fortunately, nptd's leap smearing should(!) be able to support both positive and negative leap seconds, and anything running on smeared time should(!) be blissfully ignorant of the discontinuity. This is also the case with AWS and GCP, both of which give you smeared time by default.
For the uninitiated: Leap smearing involves gradually slewing the clock over an extended period of time in such a way that time remains perfectly continuous and doesn't disagree very much with the "real world," in terms of both frequency and offset. If you use a linear smear centered on the leap second, the maximum offset is 0.5 seconds at the moment of the leap second itself - which is only a little more than plausible network RTT, so most technologies don't care. Similarly, if your linear smear lasts for a full 24 hours (which is the typical/recommended configuration), then the frequency only changes by 1 part in 86,400, which turns out to be pretty insignificant. Nevertheless, if you have e.g. legal reasons to require sub-second accuracy, then this may not work for your use case. But the people in that boat are already screwed anyway, because none of the standard POSIX time APIs can tell the difference between 23:59:59 and 23:59:60 in the first place.
As for Miller not wanting to backport the fix: If he doesn't have the resources or desire to maintain a stable branch, then that's his decision. The distros will do the backporting, or not, as they see fit.
A GPSD time warp
Posted Aug 4, 2021 21:38 UTC (Wed) by ericonr (guest, #151527) [Link]
Following links from the issue:
https://berthub.eu/articles/posts/leapseconds-expose-bugs... points to https://565851109.xyz/, which currently predicts a negative leap second in 2029. It doesn't claim to be accurate or anything, but it's interesting to see.
A GPSD time warp
Posted Aug 5, 2021 17:54 UTC (Thu) by ncm (guest, #165) [Link]
A GPSD time warp
Posted Aug 5, 2021 18:41 UTC (Thu) by NYKevin (subscriber, #129325) [Link]
* Unix time can't represent 23:59:60, so doing things as specified by UTC is impossible. You have to deviate from UTC in one fashion or another, or else you have to completely tear out and replace the entire POSIX time API. The question is *not* "Should we deviate from UTC?" because we have no choice. The question instead is "*How* shall we deviate from UTC?"
* Jumping time backwards is notorious for breaking things, because you get non-monotonic timestamps. Testing for those bugs is hard. Fixing those bugs is also hard.
* Yes, CLOCK_MONOTONIC exists. You get a gold star if you remembered to use it everywhere. But did you audit all of the library code in your entire sprawling application? What about all of the daemons and other services that it relies on? Are you absolutely sure that every single time anyone asks for any timestamp, in any part of the system whatsoever, they always scrupulously observe the distinction between CLOCK_MONOTONIC and CLOCK_REALTIME, and never inadvertently assume that the latter will be monotonic?
* In practice, many sysadmins were already using ntpd -x to get the same effect, but worse (because it would only start slewing after the leap second, and do so in a less predictable way over a shorter period of time, with the result that the clock gets way wonkier than under "proper" leap smearing).
A GPSD time warp
Posted Aug 6, 2021 16:23 UTC (Fri) by tialaramex (subscriber, #21167) [Link]
In a thousand years, if people are annoyed about when the sun is highest in the sky according to their wristwatch, they can lobby their local government to tweak their timezones, but TAI continues unaltered. If there are people. And wristwatches. And governments. And timezones. The Sun isn't going anywhere in that timeframe, but we might.
I don't care if we abolish leap seconds and keep UTC offset as it is, abolish UTC entirely, or even find some weird compromise, but the current situation is very silly, even if there are other sillier things that could have happened instead.
A GPSD time warp
Posted Aug 6, 2021 17:48 UTC (Fri) by NYKevin (subscriber, #129325) [Link]
Yes, getting rid of leap seconds would solve the problem at a stroke, but Wikipedia reckons that the politicians have been bickering over it since 2005 with little apparent progress: https://en.wikipedia.org/wiki/Leap_second#International_p...
Any country in the world could unilaterally start following TAI instead of UTC, of course. Civil time is ultimately "just a law" that each national legislature passes independently. It simply happens to be the case that everyone currently uses UTC. But the disruption of having some time zones following TAI and some following UTC would make this problem quadratically more complicated, rather than simplifying anything.
A GPSD time warp
Posted Aug 6, 2021 19:05 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]
TAI differs from the UTC by 37 seconds right now. This is pretty significant for many applications, and the flag-day switch will likely cause a lot of very unfunny hilarity.
I've had experience with this before. One of my previous jobs had a database that was configured to use TAI for timestamps for some reason, and switching to UTC resulted in quite a few interesting problems.
A GPSD time warp
Posted Aug 6, 2021 19:33 UTC (Fri) by tialaramex (subscriber, #21167) [Link]
Yes politicians will take many years to actually make it happen. Unlike with climate change if they take 20 more years to do anything about Leap Seconds solving the problem will probably be about the same difficulty, we'll have just wasted 20 years.
A GPSD time warp
Posted Aug 15, 2021 18:29 UTC (Sun) by intgr (subscriber, #39733) [Link]
Surely all computers using the same smearing configuration will agree what time it is? So the appropriate thing to do would be to standardise the leap smearing configuration, seems easier than any of the alternatives. And the industry has already been moving in this direction.
A GPSD time warp
Posted Aug 5, 2021 7:37 UTC (Thu) by tpetazzoni (subscriber, #53127) [Link]
The effect will be a return to March 2002, which is not what anyone wants—or expects.
Are we sure that this is not what anyone wants? As a reminder, if we were to go back in 2002, there would be no pandemic, the climate change would be a much less stressing and urgent issue (of course it already was, but to a less visible extent), Youtube would not exist to waste our time watching silly videos of cats and above all we would all be 19 years younger.
So are you really sure we don't want to return to March 2002 ? :-)
A GPSD time warp
Posted Aug 5, 2021 8:01 UTC (Thu) by mpr22 (subscriber, #60784) [Link]
"I tell you now if they were given
Chance to live their lives again--
Wise man's son and Wednesday's child
Would make the same mistakes as then"
— Skyclad, "The Widdershins Jig"
A GPSD time warp
Posted Aug 6, 2021 22:49 UTC (Fri) by notriddle (subscriber, #130608) [Link]
A GPSD time warp
Posted Aug 14, 2021 10:32 UTC (Sat) by jd (subscriber, #26381) [Link]