[LinuxPPS] PPS stops working after a few seconds

Hal V. Engel hvengel at astound.net
Tue Feb 17 20:48:03 CET 2009


On Tuesday 17 February 2009 12:42:29 am Heiko Gerstung wrote:
> Hal V. Engel schrieb:
> > On Monday 16 February 2009 10:42:15 am clemens at dwf.com wrote:
> >>> Hi!
> >>>
> >>> I lost my recent work on LinuxPPS stuff due to a hard disk failure (I
> >>> know, I know, ....).
> >>>
> >>> Now I am back on track trying to get the LinuxPPS stuff working again.
> >>> I already build Folkerts ppsldisc version and tried to use it during my
> >>> system startup, but it looks like the PPS stuff hangs after a while. I
> >>> see that there is an entry in /sys/class/pps/pps0/assert but it does
> >>> not increase anymore. It gets stuck around #36 and never goes up
> >>> afterwards, no matter how often I kill ppsldisc and restart it.
> >>>
> >>> The strange thing is that when I do not start ppsldisc during my system
> >>> init phase, login to the unit afterwards and manuall kill ntpd, start
> >>> ppsldisc and then restart ntpd, it seems to work.
> >>>
> >>> My guess is that one of the other pieces of software that are running
> >>> on my machine are doing something with the serial port that sends the
> >>> PPS stuff into the bin.
> >>>
> >>> Any ideas on what I could do besides starting this stuff manually?
> >>
> >> Well, thank god there are two of us...
> >>
> >> Ive been having the exact same problem now for a couple three months.
> >> USUALLY, if I try starting by hand (and not by init.d) it works. 
> >> Sometimes the init.d start will have hung things so badly that it cant
> >> be restarted and I have to reboot with the init.d start turned off.
> >>
> >> Ive tried to put together a piece of example code to show the problem,
> >> but have failed there,- your comment about perhaps something else
> >> playing with the serial port, would explain that.  And as Ive noted,
> >> this happens ONLY on one of my two machines running NTP, the other
> >> starts fine from the init.d script.
> >>
> >> Strange, and something we should understand.
> >
> > There have been other reports here about this sort of thing happening on
> > some/most/every first startup of ntp after a reboot.  I have seen this
> > same issue.  For a while on my machine the problem was intermittent.  But
> > after rebuilding my system it started happening consistently.  The "fix"
> > I used was to modify the init script to start then stop and then restart
> > ppslidc and ntp. I use a wrapper init script that handles starting
> > ppsldisc and ntp.  This has worked but is a hack that is covering up the
> > real problem.
> >
> > When I read Heiko's email he mentioned that some other init process might
> > be causing the problem perhaps by messing with the serial port.  So I
> > tried some tests on my system to see if this might be the case.    This
> > does not appear to be the case since I have the same issue even if I
> > remove the ntp startup script from the init system and I start it
> > manually after the system is fully up and running and all other init
> > scripts/processes have finished running.
> >
> > I did some more testing to see if perhaps starting,  stopping and
> > restarting just ppsldisc before stating ntpd would change how this
> > behaves.  It did not. So between ppdldisc and ntpd something happens the
> > first time things are started that causes a second try of the sequence to
> > work.
> >
> > It occurred to me that perhaps this was something that happened when the
> > refclock driver opened the serial port and I tried setting the baud rate
> > of the port to 9600 baud with setserial before starting ppsldisc and ntp.
> >  This made things worse as my Oncore would no longer initialize even
> > though it appeared to be talking to the driver (IE. I could see things
> > happening in the clockcstats file but it hung on the initialize command).
> >   Now it could still be something that happens when the refclock driver
> > opens the port but I have no idea what that might be.
> >
> > So Reg there are more than just two users who are seeing this issue and
> > if memory serve me this is not limited to Oncore users.  I "fixed" it by
> > always starting things two times in my wrapper initi script which by the
> > way always works.  But it would be better if someone could figure out
> > what the underlaying problem is and fix it.  I have attached my wrapper
> > init script if anyone is interested.
> >
> > Hal
>
> Hal and Clemens,
>
> could you check if setserial probably causes this? I am currently
> checking if I am running setserial during the boot process and I seem to
> remember that some Linux distros run setserial in their init scripts as
> well. It would be interesting to switch of things one after the other in
> order to identify what is causing the problems (anyone remember the days
> when we did this with ISA/PCI cards in our old PCs?), but I guess that
> is much easier for me with my embedded system than it is with a full
> feature Linux system.
>
>
> Regards,
>    Heiko

I had this setup in a udev rule like this:

KERNEL=="ttyS1", RUN+="/bin/setserial -v /dev/%k low_latency", 
SYMLINK+="oncore.serial.0"

There are no rc scripts on my system that run setserial and it is not 
installed by default.  I had been using a version that had the LinuxPPS 
patches that were required for an earlier version of LinuxPPS.  I installed an 
unpatched version and tested with it and got the same result (IE. I needed to 
start things twice to get it working).  I also changed the udev rule to look 
like this:

KERNEL=="ttyS1", SYMLINK+="oncore.serial.0"

So I was not running setserial at all and got the same result.  So I don't 
think that has anything to do with setserial.

Hal





More information about the LinuxPPS mailing list