[LinuxPPS] PPS/ntpd fails on fresh startup

Hal V. Engel hvengel at astound.net
Tue Oct 14 18:19:53 CEST 2008


On Tuesday 14 October 2008 08:44:11 Luca Bertagnolio wrote:
> Rodolfo,
>
> On Tue, Oct 14, 2008 at 2:05 PM, Rodolfo Giometti <giometti at enneenne.com> 
wrote:
> >> Looks like ntpd might have some initialization issues, maybe? BTW,
> >> this happens with both a 4.2.4p4 and a 4.2.5p135 build.
> >
> > Yes. If I well remember some GPS antenna should be enabled for PPS
> > signaling, maybe something there is wrong?
>
> my Garmin GPS18LVC keeps PPS running the whole time, and I've configured it
> so that the RMC sentence is the only one sent over the serial line.  The
> PPS line is hardwired to an LED which keeps flashing roughly every
> second... :-D
>
> I don't believe that the NMEA driver (20) does initialize the GPS
> unless a "mode"
> line is added to the configuration, and I don't have such line.
>
> > Try to take a look into your NTPD GPS driver and try to understand
> > where the IRQ or PPS are disabled.
>
> I can try to read into refclock_nmea.c but what strikes me is the fact
> that on the
> very first run it doesn't work, while on the second and following it
> *does* work.
>
> Could it be something that it's done in the *deinitialization* phase
> of ntpd that
> "fixes" the situation?  Maybe Venu can give us a few more hints as he's
> worked on the refclock_nmea.c code in the past, I see from the notes...
>
> > Do you know if other people with your same GPS antenna have got
> > similar problems?
>
> nope.  I've only heard about happy users of the GPS18LVC, but maybe their
> first instance of ntpd does not work either, but they bitch and moan a
> whole less than I do... :-D

Or it could be intermittent like it is on my machine with a different refclock 
type.  When problems are intermittent they are extremely difficult to locate 
and fix.  I wrote a piece of software many years ago that had intermittent 
hangs - perhaps it would hang 1 out of 30 uses.  I knew it had an issue but 
the problem was so intermittent that it was basically impossible to locate the 
cause. After about 6 months of use I got a call from one user who was seeing 
the hangs perhaps 1 out of every 3 or 4 uses.  It turned out that she was 
using different hardware then anyone else (IE. a different model of PC).  I 
checked to see if anyone else near my office (the user was located about 200 
miles away) had the same type of hardware and found out that there was someone 
with the same hardware on the floor above my office.  I arranged to get access 
to that machine when the user would be away on a business trip for a few days.  
During testing I found that this machine would fail at about the same rate as 
my users machine and because it was failing often enough I was able to isolate 
the problem and fix it in a few hours.  By the way it turned out to be an 
interrupt masking issue.

My point is that with my machine failing to correctly get ntpd working with 
PPS only perhaps 1 out of 20 reboots I decided that it would be almost 
impossible to locate the source of the problem so I did not even bother 
reporting it.  But we now have two test cases where it happens consistently - 
IE. Lucas and William.  So it should be easier to locate and fix the source of 
the problem.  Also since it is happening to me and I have a different refclock 
that means that this is not refclock specific but it could be either a problem 
in ntpd or linuxpps or perhaps in the kernel code or even an interaction 
between these players.  But Rodolfo is correct that there are lots of players 
and it could be an issue in any one of these and if it turns out to be in 
something other than linuxpps (for example the kernel could be masking 
interrupts at the wrong time) it might be take some time to get a fix in 
place.

Hal 





More information about the LinuxPPS mailing list