[LinuxPPS] Strange offset behavior

Hal V. Engel hvengel at astound.net
Tue Jun 16 20:40:23 CEST 2009


On Tuesday 16 June 2009 10:46:02 am Andrew Hills wrote:
> Hi all,
>
> I'm observing some strange behavior of the clock offset. I set up two
> machines, nearly identical on the software side (same patched kernel
> version, same version of NTP), and monitored their offsets overnight by
> polling `ntpq -p` once per second. I graphed the results here (sorry for
> the PDF): http://people.umass.edu/ahills/ntpq_offset.pdf
>
> The top graph represents the entire data set; the bottom graph is the
> first twenty thousand samples or so (to display more detail).
>
> The green one, L13, is a machine that has been working fine for the past
> several weeks. The red one, L11, is one that I've just recently revived;
> I'm told that it worked before. However, the offset behavior seems very
> strange. Its offset jumps a lot at the beginning and exceeds the usual
> 15us limit I observe with the other machine by a factor of four. It
> takes several hours to finally synchronize with the PPS signal, and
> there's also that strange peak at about 5am GMT.
>
> Can anyone think of an explanation for this behavior? The GPS hardware
> has been confirmed to work in the expected manner on another machine.
>
> --Andrew Hills

Did you do the convergence patch to these machines?  Of not it is not unusual 
for ntp to take several hours to bring the local clock into tight sync with 
the ref clock. So the first 4 1/2 hours don't look that unusual to me assuming 
that you have not applied the convergence patch.  

The negative offset that occurs around 5:00 looks like it is related to a 
temperature fluctuation.  Why to I say that?  First I am assuming that both 
machines are in the same physical environment.  Notice that there is a much 
smaller negative offset in the L13 part of the graph around the same time.  It 
would appear that L11's oscillator has a much higher temperature coefficient 
and is much less stable than L13.

Once L11 stabilized at around 5:40 it kept the offset fairly small (the 
largest was around 20us - again about what I would expect if the convergence 
path had not been applied) but still had larger offsets than L13 which never 
exceeded 10us during that same time frame.

So some questions.

Are you using the convergence patch?

If so what did you set SHIFT_PLL to?  On machines with less stable oscillators 
it may be advantageous to use values lower than 2.

Did you build ntp so that it would be getting nanosecond times from the 
system?

What counter is being used on these system for time keeping (EI. tsc, 
HPET...)?

What kernel version are you using?

Hal





More information about the LinuxPPS mailing list