[LinuxPPS] PPS stops working after a few seconds

Heiko Gerstung heiko.gerstung at meinberg.de
Tue Feb 17 09:42:29 CET 2009


Hal V. Engel schrieb:
> On Monday 16 February 2009 10:42:15 am clemens at dwf.com wrote:
>   
>>> Hi!
>>>
>>> I lost my recent work on LinuxPPS stuff due to a hard disk failure (I
>>> know, I know, ....).
>>>
>>> Now I am back on track trying to get the LinuxPPS stuff working again.
>>> I already build Folkerts ppsldisc version and tried to use it during my
>>> system startup, but it looks like the PPS stuff hangs after a while. I
>>> see that there is an entry in /sys/class/pps/pps0/assert but it does not
>>> increase anymore. It gets stuck around #36 and never goes up afterwards,
>>> no matter how often I kill ppsldisc and restart it.
>>>
>>> The strange thing is that when I do not start ppsldisc during my system
>>> init phase, login to the unit afterwards and manuall kill ntpd, start
>>> ppsldisc and then restart ntpd, it seems to work.
>>>
>>> My guess is that one of the other pieces of software that are running on
>>> my machine are doing something with the serial port that sends the PPS
>>> stuff into the bin.
>>>
>>> Any ideas on what I could do besides starting this stuff manually?
>>>       
>> Well, thank god there are two of us...
>>
>> Ive been having the exact same problem now for a couple three months.
>> USUALLY, if I try starting by hand (and not by init.d) it works.  Sometimes
>> the init.d start will have hung things so badly that it cant be restarted
>> and I have to reboot with the init.d start turned off.
>>
>> Ive tried to put together a piece of example code to show the problem,
>> but have failed there,- your comment about perhaps something else playing
>> with the serial port, would explain that.  And as Ive noted, this happens
>> ONLY on one of my two machines running NTP, the other starts fine from
>> the init.d script.
>>
>> Strange, and something we should understand.
>>     
>
> There have been other reports here about this sort of thing happening on 
> some/most/every first startup of ntp after a reboot.  I have seen this same 
> issue.  For a while on my machine the problem was intermittent.  But after 
> rebuilding my system it started happening consistently.  The "fix" I used was 
> to modify the init script to start then stop and then restart ppslidc and ntp.   
> I use a wrapper init script that handles starting ppsldisc and ntp.  This has 
> worked but is a hack that is covering up the real problem.
>
> When I read Heiko's email he mentioned that some other init process might be 
> causing the problem perhaps by messing with the serial port.  So I tried some 
> tests on my system to see if this might be the case.    This does not appear 
> to be the case since I have the same issue even if I remove the ntp startup 
> script from the init system and I start it manually after the system is fully 
> up and running and all other init scripts/processes have finished running.
>
> I did some more testing to see if perhaps starting,  stopping and restarting 
> just ppsldisc before stating ntpd would change how this behaves.  It did not.   
> So between ppdldisc and ntpd something happens the first time things are 
> started that causes a second try of the sequence to work.  
>
> It occurred to me that perhaps this was something that happened when the 
> refclock driver opened the serial port and I tried setting the baud rate of 
> the port to 9600 baud with setserial before starting ppsldisc and ntp.  This 
> made things worse as my Oncore would no longer initialize even though it 
> appeared to be talking to the driver (IE. I could see things happening in the 
> clockcstats file but it hung on the initialize command).   Now it could still 
> be something that happens when the refclock driver opens the port but I have 
> no idea what that might be.
>
> So Reg there are more than just two users who are seeing this issue and if 
> memory serve me this is not limited to Oncore users.  I "fixed" it by always 
> starting things two times in my wrapper initi script which by the way always 
> works.  But it would be better if someone could figure out what the 
> underlaying problem is and fix it.  I have attached my wrapper init script if 
> anyone is interested.
>
> Hal
>   

Hal and Clemens,

could you check if setserial probably causes this? I am currently 
checking if I am running setserial during the boot process and I seem to 
remember that some Linux distros run setserial in their init scripts as 
well. It would be interesting to switch of things one after the other in 
order to identify what is causing the problems (anyone remember the days 
when we did this with ISA/PCI cards in our old PCs?), but I guess that 
is much easier for me with my embedded system than it is with a full 
feature Linux system.


Regards,
   Heiko




More information about the LinuxPPS mailing list