[LinuxPPS] 1PPS Atom Ref Clock Not being Polled?

Mon Aug 3 19:06:21 CEST 2009

On Sunday 02 August 2009 10:22:51 pm K. Connolly wrote:
> Dear all,
>
> For those of you following my LinuxPPS adventure, here's the brief update:
> I've setup the hardware to lengthen the 25us PPS signal to 10ms, and then
> to 100ms. The LinuxPPS ppstest utility showed changes as the pulse width
> was changed (good sign), and the results for 100ms pulse width look like
> this:
>
> source 0 - assert 1249267484.999486868, sequence: 5055 - clear 
> 1249267484.107793025, sequence: 5054 source 0 - assert
> 1249267484.999486868, sequence: 5055 - clear  1249267485.107794063,
> sequence: 5055 source 0 - assert 1249267485.999488361, sequence: 5056 -
> clear  1249267485.107794063, sequence: 5055 source 0 - assert
> 1249267485.999488361, sequence: 5056 - clear  1249267486.107796096,
> sequence: 5056 source 0 - assert 1249267486.999486891, sequence: 5057 -
> clear  1249267486.107796096, sequence: 5056 source 0 - assert
> 1249267486.999486891, sequence: 5057 - clear  1249267487.107793354,
> sequence: 5057
>
> Which, I think, compare favorably with posts earlier in this thread.

Yes that looks normal and you are no longer losing assert events.

> However, the results of my PPS source not being polled have not changed
> and I still have the following status:
>
>       remote           refid      st t when poll reach   delay   offset 
> jitter
> ===========================================================================
>=== *ntp01.server.jp 192.15.10.25     2 u  238  256  377   11.000    1.496  
> 2.162 PPS(0)          .PPS.            0 l    -   64    0    0.000    0.000
>   0.001
>
> The associations list the PPS device as reachable, but rejected:
>
> ind assID status  conf reach auth condition  last_event cnt
> ===========================================================
>    1 31619  9614   yes   yes  none  sys.peer   reachable  1
>    2 31620  8015   yes   yes  none    reject  clock expt  1

It appears that you are using a single external prefer peer and this makes it 
highly likely that ntp will reject the local atom ref clock.  This appears to 
be a characteristic of the Atom driver because of the need for a prefer peer 
and the way it selects or rejects time sources.  

On my machine when booted to Windows I use the Atom driver with my Oncore 
since the Oncore driver is not supported on Windows.   In fact I think the 
only ref clock driver that works in Windows is the Atom driver and then only 
with very recent development versions of ntp.   I found the atom driver/ntp to 
be very picky about how the prefer and peer servers where configured.  I know 
that ntp requires at least 3 servers to be able to decide if any of them are 
valid and the atom ref clock is not used for this if it is in a rejected 
state.   The Atom ref clock always starts out as rejected.  If you have less 
than 3 peers then ntp can not establish a confidence interval and it will 
always reject the atom ref clock.   

In my case I was using 3 hand selected stratum 1 external servers that were 
all in tight agreement about what the time is and the Atom ref clock would 
fall outside of the confidence interval that these external clocks established 
and the ref clock would be rejected most of the time.  It took me a long time 
to figure out what was going on with this.  

I fixed this by using a hand selected prefer server and then adding 4 pool 
servers.  What this did was cause ntp to establish the confidence interval 
based on the pool servers and since these had way more variability than the 
hand selected severs I had been using it made it so that the local ref clock 
was highly likely to be with in the confidence interval and it would not be 
rejected.  

Additional things to consider.  

1. NTP will NEVER reject the prefer peer.  So select it with care.  It should 
be the nearest (IE. least network hops, fasted ping times) and most accurate  
of your external servers.

2. You actually want at least some of the non-prefer peers to be lower quality 
so that they are more likely to be rejected than the atom ref clock.

3. You also want to have a large enough set of peer servers that ntp can 
reject one or two and still have three valid external servers (this includes 
the prefer server).

You might try adding some thing like this to your ntp.conf:

server 0.north-america.pool.ntp.org minpoll 7 maxpoll 14
server 1.north-america.pool.ntp.org minpoll 7 maxpoll 14
server 2.north-america.pool.ntp.org minpoll 7 maxpoll 14
server 3.north-america.pool.ntp.org minpoll 7 maxpoll 14

>
> Furthermore, the detailed "ntpq rv" listing from ntpq gives this detailed
> message, with the noted observation of the line "flash=1600 peer_stratum,
> peer_dist, peer_unfit" (or other bits?):
>
> assID=31620 status=8015 unreach, conf, 1 event, event_peer_clock,
> srcadr=PPS(0), srcport=123, dstadr=127.0.0.1, dstport=123, leap=11,
> stratum=0, precision=-20, rootdelay=0.000, rootdispersion=0.000,
> refid=PPS, reach=000, unreach=0, hmode=3, pmode=4, hpoll=6, ppoll=10,
> flash=1600 peer_stratum, peer_dist, peer_unfit, keyid=0, ttl=0,
> offset=0.000, delay=0.000, dispersion=16000.000, jitter=0.001,
> reftime=00000000.00000000  Thu, Feb  7 2036 15:28:16.000,
> org=00000000.00000000  Thu, Feb  7 2036 15:28:16.000,
> rec=00000000.00000000  Thu, Feb  7 2036 15:28:16.000,
> xmt=ce20cb46.b634a569  Mon, Aug  3 2009 11:34:46.711,
> filtdelay=     0.00    0.00    0.00    0.00    0.00    0.00    0.00   
> 0.00, filtoffset=    0.00    0.00    0.00    0.00    0.00    0.00    0.00  
>  0.00, filtdisp=   16000.0 16000.0 16000.0 16000.0 16000.0 16000.0 16000.0
> 16000.0
>
>
> I am at a loss for next steps. I've tried different kernels, different
> patches, different versions of ldattach and ntpd. Any thoughts, as always,
> are much appreciated.
>
> Cheers,
>
> -Kevin
>
> On Thu, 23 Jul 2009, K. Connolly wrote:
> > On Thu, 23 Jul 2009, Hal V. Engel wrote:
> >> Either a longer pulse fixes the problem or is does not.
> >
> > I agree :D Which is why I'm trying all sorts of various options. Anyway,
> > one last note before I depart for the weekend: I've tried 4.2.4p5,
> > 4.2.4p7 and ntp-dev-4.2.5p191. All three were configured identically
> > etc., the two stable releases give identical log resuts (which seem
> > revealing), and the dev version won't even recognize the ATOM refclock.
> >
> > 4.2.4p5 and p7, the PPS "clk_noreply" message seems ominous:
> >
> > 24 Jul 11:28:59 ntpd[29516]: offset -0.000533 sec freq 145.864 ppm error
> > 0.000371 poll 6
> > 24 Jul 11:29:14 ntpd[29520]: system event 'event_restart' (0x01) status
> > 'sync_alarm, sync_unspec, 1 event, event_unspec' (0xc010)
> > 24 Jul 11:29:15 ntpd[29520]: peer 192.153.107.22 event 'event_reach'
> > (0x84) status 'unreach, conf, 1 event, event_reach' (0x8014)
> > 24 Jul 11:29:16 ntpd[29520]: clock PPS(0) event 'clk_noreply' (0x01)
> > 24 Jul 11:29:16 ntpd[29520]: peer PPS(0) event 'event_peer_clock' (0x85)
> > status 'unreach, conf, 1 event, event_peer_clock' (0x8015)
> > 24 Jul 11:29:21 ntpd[29520]: system event 'event_peer/strat_chg' (0x04)
> > status 'sync_alarm, sync_ntp, 2 events, event_restart' (0xc621)
> > 24 Jul 11:29:21 ntpd[29520]: synchronized to 192.153.107.22, stratum 2
> > 24 Jul 11:29:21 ntpd[29520]: kernel time sync status change 2001
> > 24 Jul 11:29:21 ntpd[29520]: system event 'event_sync_chg' (0x03) status
> > 'leap_none, sync_ntp, 3 events, event_peer/strat_chg' (0x634)
> > 24 Jul 11:29:21 ntpd[29520]: system event 'event_peer/strat_chg' (0x04)
> > status 'leap_none, sync_ntp, 4 events, event_sync_chg' (0x643)
> >
> > ******
> >
> > As for why the dev version fails to recognize the ATOM refclock I'm not
> > sure (as mentioned, 'twas compiled with identical options/explicit
> > inclusion of ATOM driver...), but perhaps working with the dev version
> > should be left until another time:
> >
> > 24 Jul 11:37:18 ntpd[29520]: offset -0.000448 sec freq 145.851 ppm error
> > 0.000688 poll 6
> > 24 Jul 11:37:26 ntpd[29524]: 192.153.107.22 8011 81 mobilize assoc 60783
> > 24 Jul 11:37:26 ntpd[29524]: refclock_newpeer: clock type 22 invalid
> > 24 Jul 11:37:26 ntpd[29524]: 127.127.22.0 interface 127.0.0.1 -> (null)
> > 24 Jul 11:37:26 ntpd[29524]: 0.0.0.0 c016 06 restart
> > 24 Jul 11:37:26 ntpd[29524]: 0.0.0.0 c012 02 freq_set kernel 145.851 PPM
> > 24 Jul 11:37:27 ntpd[29524]: 192.153.107.22 8024 84 reachable
> > 24 Jul 11:37:33 ntpd[29524]: 192.153.107.22 963a 8a sys_peer
> > 24 Jul 11:37:33 ntpd[29524]: 0.0.0.0 c615 05 clock_sync
> >
> > As always, thanks for the input.
> > Cheers,
> >
> > -Kevin
> >
> >>> Furthermore, despite there being some apparent problem with the asserts
> >>> being captured 100%, ppstest still does show -some- (a lot, actually)
> >>> of assert/clear events quite nicely, so shouldn't NTPd have an iota of
> >>> functionality? i.e., keeping the problem of my PPS source not being
> >>> polled at all in mind, does this assert/pulse width issue seem like a
> >>> likely candidate for a solution?
> >>
> >> I don't know how likely it is.  When I boot my machine over to Windows I
> >> use a newer version of ntp that will use the Atom driver (the Oncore
> >> driver does not work at this point in Windows otherwise I would use it)
> >> and it takes a long time (7 to 8 minutes most of the time) to decide
> >> that the PPS from my Oncore is good to go where as when I use the Oncore
> >> driver when booted to Linux it syncs up in perhaps 30 seconds or so and
> >> that includes the overhead of initializing the Oncore which is something
> >> that does not happen on Windows. So the Atom driver does appear to be
> >> very picky about the PPS signal and about when it will start working. 
> >> Clearly much more so than the Oncore driver.
> >>
> >>> It's Friday here (Japan) and I've got some other work this afternoon,
> >>> so I'll send an update in a few days!
> >>>
> >>> Cheers,
> >>>
> >>> -Kevin
> >>>
> >>>>> Despite this promising result, the behavior of my NTPd is still the
> >>>>> same, and the PPS device is not being polled. And, for what it's
> >>>>> worth, I tried switching the rising edge detection to falling edge,
> >>>>> to no avail. UHG.
> >>>>>
> >>>>> -Kevin
> >>>>
> >>>> You may be the first person to use a device with such a short PPS
> >>>> pulse with LinuxPPS. I have seen stuff on the net where users have had
> >>>> GPS devices with very short PPS pulses where they used a conditioning
> >>>> circuit to get a longer pulse that they then used for their timing
> >>>> application.
> >>>>
> >>>>
> >>>>
> >>>> Virtually all of those using LinuxPPS have devices that have MUCH
> >>>> longer PPS pulses. These will typically be 100ms to 200ms which is
> >>>> about 4 orders of magnitude longer. For example there are a bunch of
> >>>> us that use Motorola Oncores and these have a 200ms PPS pulse. The
> >>>> Garman 18 series is also popular and I think by default has a 200ms
> >>>> pulse. So there is not much, if any, experience here with short pulse
> >>>> length devices like yours.
> >>>>
> >>>>
> >>>>
> >>>> From the above PPS test it appears that your PPS pulse is about 25us
> >>>> (0.025ms) long which is way shorter than the refclocks that most of us
> >>>> use. I am not sure if this is related to what you are seeing but I am
> >>>> curious if anyone else here is using a refclock with a very short PPS
> >>>> pulse like your device. If so did you have any issues with the device?
> >>>>
> >>>>
> >>>>
> >>>> From http://www.febo.com/pages/soekris/
> >>>>
> >>>>
> >>>>
> >>>> ''The PPS signal driving the Elan timer must be a positive pulse
> >>>> (going from 0 volts to 5 volts at the on-time point) and must be at
> >>>> least one timer tick long -- that means at least 1 millisecond (and
> >>>> preferably 2 or 3 ms to allow room for error) if the "HZ" value in the
> >>>> kernel is set to 1000 as recommended below. If your signal does not
> >>>> meet these requirements, a TAPR FatPPS can condition the signal to
> >>>> these values."
> >>>>
> >>>>
> >>>>
> >>>> This web page is not about LinuxPPS since the machine being described
> >>>> is a FreeBSD derived machine. So the above may not be valid for a
> >>>> LinuxPPS based machine (anyone know?).
> >>>>
> >>>>
> >>>>
> >>>> As you can see there is even a RS-232 converter available to lengthen
> >>>> the PPS pulse for devices like yours. So it appears to be a known and
> >>>> fairly common issue. The FatPPS device is $49 plus shipping (ouch!) so
> >>>> it is not cheap but it is also produced in very small numbers so this
> >>>> likely accounts for the high price. There is a manual for the FatPPS
> >>>> device on-line and it has a schematic. The TAPR folks are Ham radio
> >>>> types and Ham radio types almost always make schematics available
> >>>> since it is the nature of the beast. You should be able to build the
> >>>> device based on the schematic for perhaps $10 to $15 if you are handy
> >>>> with a soldering iron since the circuit appears to be fairly simple
> >>>> and only involves a handful of components. This circuit will lengthen
> >>>> the PPS pulse to at least 25ms but this can be adjusted by changing
> >>>> the values of some of the components.
> >>>>
> >>>>> Could this be because of the short
> >>>>>
> >>>>>> length of the PPS pulse? From my reading for this to be reliable the
> >>>>>> pulse length has to be at least = 1/Hz rate of the kernel (IE. 100ms
> >>>>>> on a 100Hz system or 10ms on a 1000Hz system).
> >>>>
> >>>> I was off by an order of magnitude with the above. It should have
> >>>> read:
> >>>>
> >>>>
> >>>>
> >>>> ... 10ms on a 100Hz system or 1ms on a 1000Hz system ...
> >>>>
> >>>>>> But I don't know for sure that
> >>>>>> this is really true but in any case your pulse is 20us (0.02ms) and
> >>>>>> this is way faster. So this is at least suspect.
> >>>>>>
> >>>>>> Also
> >>>>>>
> >>>>>>> server 127.127.22.0 minpoll 4 maxpoll 4
> >>>>>>> fudge 127.127.22.0 flag3 1 flag2 0 time1 0.000 stratum 0
> >>>>>>
> >>>>>> You are telling ntp to use the assert edge (flag2 0) and since you
> >>>>>> are not seeing the assert when you run ppstest I think this is why
> >>>>>> ntp is failing since it never see an assert. As a test try using the
> >>>>>> clear edge (flag2 1). If that works then you know that the problem
> >>>>>> is that the machine is not seeing the assert events.
> >>>>>>
> >>>>>> Hal
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> LinuxPPS mailing list
> >>>>>> LinuxPPS at ml.enneenne.com
> >>>>>> http://ml.enneenne.com/cgi-bin/mailman/listinfo/linuxpps
> >>>>>> Wiki: http://wiki.enneenne.com/index.php/LinuxPPS_support
> >>
> >> _______________________________________________
> >> LinuxPPS mailing list
> >> LinuxPPS at ml.enneenne.com
> >> http://ml.enneenne.com/cgi-bin/mailman/listinfo/linuxpps
> >> Wiki: http://wiki.enneenne.com/index.php/LinuxPPS_support
> >
> > _______________________________________________
> > LinuxPPS mailing list
> > LinuxPPS at ml.enneenne.com
> > http://ml.enneenne.com/cgi-bin/mailman/listinfo/linuxpps
> > Wiki: http://wiki.enneenne.com/index.php/LinuxPPS_support