[LinuxPPS] Re: strange readings...

Fri May 18 10:50:21 CEST 2007

Martin wrote:
> 
> On Fri, May 18, 2007 at 07:52:35AM +1000, James Boddington wrote:
>> CONFIG_NO_HZ=y
>> CONFIG_HZ_100=y
>> # CONFIG_HZ_250 is not set
>> # CONFIG_HZ_300 is not set
>> # CONFIG_HZ_1000 is not set
>> CONFIG_HZ=100
> 
> 
> You say you are using NO_HZ, but you also have HZ_100
> selected. Does the NO_HZ overide the other?
> 

I had another look. Used cat /proc/interrupts ; sleep 10; cat /proc/interrupts. 
The timer counter changed by 1002 over 10s so 100 ticks per second.

Time: tsc clocksource has been installed.
Clocksource tsc unstable (delta = 1009953617 ns)
Time: pit clocksource has been installed.

It looks like my ntp machine is HZ=100 after all.

My desktop is
CONFIG_NO_HZ=y
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
CONFIG_HZ_1000=y
CONFIG_HZ=1000

Using cat /proc/interrupts ; sleep 10; cat /proc/interrupts again gives me 4472 
timer interrupts over 10s. That averages 447 ticks per second. Dmesg also gives me

Time: acpi_pm clocksource has been installed.
Switched to NOHz mode on CPU #0

So I am assuming the desktop is running NO_HZ. I was guessing NO_HZ overrode 
the HZ value. At the moment I don't know why the ntp computer still seems to be 
HZ=100.

Desktop is a XP1700 with a mb from 2001 or so. My ntp machine is a dell gx1 
p2-350 with an old enough apci implementation that linux won't do acpi.

>> 2.6.21 still can not follow the 1pps from the garmin as
>> well as 2.6.17 and earlier or freebsd can but it is still
>> doing better than 2.6.18 - 2.6.20.
> 
> Any clue why?

Wrongly or rightly I have been blaming the clock changes 2.6.17 -> 2.6.18. 
Convergance is slower with those kernels. There have been a few comments both 
here and on lkml about this including one comment the slower convergence was 
deliberate. It was around 2.6.20 I changed to freebsd and immediately got far 
better results. I only changed back to linux when the linuxpps master branch 
was updated to 2.6.21.

The slower convergance was great on a dialup connection. I don't like it for 
chasing a pps.

2.6.18 to .20 with default minpoll maxpoll I get upwards +/-300us. Freebsd on 
the same hardware maybe +/-3us or better. I think 2.6.17 was something like 
+/-40us.

> 
> 
>> I use the SHM driver as the preferred peer. I read 16
>> offsets, drop the outliers and average the rest then use
>> the shm driver to feed the time to ntp. My hardware has a
>> nasty habit of generating large offsets. Had a 100ms and a
>> 129ms offset last night on the pps. The worst has been 1.3
>> seconds.  This happens under both linux and freebsd.
> 
> Just so I am following, you wrote something to collect the
> information from the linuxpps using the netlink interface
> and then use the shm driver to feed the time to ntp? 

Using the netlink interface is still on the great TODO list. I am cheating and 
reading /sys/class/pps/00/assert every second.

> Is there any reason you do this instead of using the nmea
> driver? Is it so you can drop the outliers? Your output
> lists the nmea driver as well and the offset/jitter looks
> the same.

To drop the outliers.

For whatever reason if I don't have my daemon running to feed time to ntp via 
shm there is still the nmea driver configured. I also keep an eye on the nmea 
driver as a sanity check for my own code.

When everything is working well the offsets and jitter of nmea and shm will be 
the same. When I get a spike it will show up with the nmea driver and filtered 
out with my driver.

(root at ntp) awk -f peer.awk /var/log/ntp/peerstats.*
        ident     cnt     mean     rms      max     delay     dist     disp
==========================================================================
127.127.1.0     1651    0.000    0.000    0.000    0.000    0.000    0.000
127.127.20.0    6454   -0.060    2.458  129.046    0.000    0.000    0.000
127.127.28.3    6319   -0.001    0.008    0.014    0.000    0.000    0.000
130.102.2.123    105   -1.560    2.964   27.725   48.110  977.164   46.992
192.231.203.132  105    0.635    2.715    7.652   26.181  958.485   47.038
220.233.200.157   76    1.404    3.042    6.761   64.275  982.728   58.725
150.101.72.73      4   -0.235    0.165    0.285   56.133  967.309  408.133
218.214.125.154   98   19.894   65.754  411.300   54.092  970.363   33.615
150.101.192.68   100  -13.506   34.207  183.280   57.263  974.439   33.250

The awk script peer.awk is in the ntp source tar ball.

Out of 6454 samples the nmea driver has a mean offset of 60us and a max offset 
of 129ms

Out of 6319 samples my shm driver has a mean offset of 1us and a max offset of 
14us.

The spikes get filtered quite nicely. Last night I had spikes of -0.100262134 
-0.050364635 and -0.129105839 in a period of 5 seconds. I get the same spikes 
in freebsd so I would say it is not a linuxpps problem. I also get the same 
spikes with 2.4.33.3 + ppskit.

-- 
    James