[LinuxPPS] [PATCHv3 03/16] pps: fix race in PPS_FETCH handler

Mon Aug 9 12:29:41 CEST 2010

В Fri, 6 Aug 2010 09:30:57 -0700 (PDT)
tlhackque <tlhackque at yahoo.com> пишет:

> > > Yes, it will freeze the fds (if they don't use timeouts). But in 
> > > normal circumstances, i.e. when pps_event is called twice a second, 
> > > it will overflow after ~68 years of uninterrupted work. Well, it's 
> > > the same kind of problem as an overflow of struct timespec. I 
> > > thought it's not actually a problem. Should I use u64 instead of 
> > > unsigned int or add a runtime check somewhere?
> > 
> >   If we're using 1PPS it's ~68 years, but someone is trying 5PPS now 
> > (it would overflow in ~13.6 years) - what if someone tries e.g. 100PPS?
> > It's not the same as overflow of struct timespec! I think it deserves 
> > some treatment.
> 
> I don't like this approach in any code.  There is no reason to write code that 
> isn't robust in the face of overflow.
> 
> Two alternatives:
>       - if all you care about is that there's a change, use a comparison for !=  
> 
>    - If you really need less than, do a modulo compare (There's reasonably 
> efficient code for this, see any network stack's sequence number comparisions.)
> 
> 
> In either case, the width of the counter needs to be how many unrecognized 
> events you can have (maybe 2x for a cheap modulo compare), not some length of 
> time before the system hangs.  This will be much, much less than 64 bits.
> 
> Every time someone thinks that their length of time is acceptable, it bites 
> someone else later.  Technology changes.  Or your code gets sent on an 
> interstellar mission that really is expected to run 120 years :-)
> 
> Seriously, I've seen these kinds of counters break in all kinds of embedded 
> systems - and there's no reason for it.  Should I tell the story about the 
> mainframe that crashed reproducibly after about 6 months of uptime because 
> everyone knew that a 32-bit uptime counter used to manage timeouts would NEVER 
> overflow in a disk controller?  No controller ever went that long without being 
> reset...until it became the least reliable component in the system.   

Ok, you're absolutely right, thanks for the review!

-- 
  Alexander
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 489 bytes
Desc: not available
Url : http://ml.enneenne.com/pipermail/linuxpps/attachments/20100809/fb66d22f/attachment.pgp