[LinuxPPS] [PATCH 02/11] Streamline interrupt handling; get rid of a lock and idr_find()

Tue Feb 10 15:15:33 CET 2009

On Tue, Feb 10, 2009 at 05:42:07AM -0500, George Spelvin wrote:
> Rodolfo Giometti <giometti at enneenne.com> wrote:
> >>    (I could make it totally safe using a seq_lock-like mechanism
> >>    if necessary.)
> >
> > Better. :)
> 
> Fine.  As I said, you won't get a crash or anything, just a fucked-up
> timestamp.  Under circumstances (5s scheduling delays) when timekeeping
> isn't working all that well anyway.  You could get the same thing if
> some nut was doing strange things with settimeofday().

Locking is used to avoid fucked-up data! :)

> But okay, I'll complexify PPS_FETCH to close the hole.
> 
> > This still considers a particular implementation. You cannot provide a
> > core implementation of anything which delegates locking of its
> > internal data to the upper layers, if so your code is broken by
> > design!
> 
> If I may be so blunt, nonsense.
> 
> All code has a contract with the code calling it.
> In this case, the contract is, "don't call pps_unregister_source while any
> pps_event calls are in progress".  This is quite a reasonable contract,

No "reasonable contracts" are allowed here, you _must_ provide bugs
free code! If pps_unregister_source() should remove a PPS source from
the system, it must be do it under any circumstance.

> widely used in multiple bits of the kernel.  the unregister function
> frees the pointer, so you shouldn't have anything referencing that
> pointer still running.

This is exacly what current implementation is trying to do. :)

The problem here is that pps_event() can be called while
pps_unregister_source() executes and viceversa so your code must
prevent crashes due this fact.

> 
> Kind of like you shouldn't free_pages() until the DMA is complete.
> 
> I agree that using an index looked up in an IDR makes the code crash-proof
> (you can always detect invalidity of the index), but there are ways of
> doing that with pointers, too.

Good! Let me see how, I don't think your code is doing it.

> 
> More to the point, using an index looked up in an IDR does NOT
> make the code bug-free.  Consider the following hypothetical code
> that calls pps_unregister_source in parallel with pps_event:
> 
> 
> CPU0			CPU1			CPU2
> 						call pps_register_source
> 						allocate PPS struct
> call pps_event		
> 			call pps_unregister_source
> 			lock IDR
> 						lock IDR...(wait)
> lock IDR...(wait)
> 			idr_find(/dev/pps0)
> 			decrement usage to 0
> 			remove pps from IDR
> 			unlock IDR
> 			deallocate PPS
> 						... lock acquired
> 						Allocate ID 0
> 						unlock IDR
> ...lock acquired
> idr_find(/dev/pps0)
> Get unexpected PPS device!

No, it gets _exactly_ the right device, that is pps0. It's your
businness to inform CPU0 on which PPS source ID it must call
pss_event(). The core is doing _exactly_ what its documentation
states.

> pps->usage++
> unlock IDR
> Do things to wrong PPS device
> 
> Now, the current pps_register_source code actually delays initializing
> the pps struct until *after* idr_get_new, so the CPU0 pps_event()
> call can crash on an uninitialized structure, but even if that's fixed
> (which isn't too hard), you still have the pps_event() calling idr_find

Yes, you find a bug! But I can fix it simply initializing the struct
before calling idr_get_new(). See the following modifications:

diff --git a/drivers/pps/kapi.c b/drivers/pps/kapi.c
index 917b379..e0b75ed 100644
--- a/drivers/pps/kapi.c
+++ b/drivers/pps/kapi.c
@@ -158,6 +158,17 @@ int pps_register_source(struct pps_source_info
*info, int d
                goto pps_register_source_exit;
        }
 
+       /* These initializations must be done before calling
idr_get_new()
+        * in order to avoid reces into pps_event().
+        */
+       pps->params.api_version = PPS_API_VERS;
+       pps->params.mode = default_params;
+       pps->info = *info;
+
+       init_waitqueue_head(&pps->queue);
+       spin_lock_init(&pps->lock);
+       atomic_set(&pps->usage, 1);
+
        /* Get new ID for the new PPS source */
        if (idr_pre_get(&pps_idr, GFP_KERNEL) == 0) {
                err = -ENOMEM;
@@ -165,28 +176,29 @@ int pps_register_source(struct pps_source_info
*info, int 
        }
 
        spin_lock_irq(&pps_idr_lock);
-       err = idr_get_new(&pps_idr, pps, &id);
-       spin_unlock_irq(&pps_idr_lock);
 
-       if (err < 0)
+       /* Now really allocate the PPS source.
+        * After idr_get_new() calling the new source will be freely
available
+        * into the kernel.
+        */
+       err = idr_get_new(&pps_idr, pps, &id);
+       if (err < 0) {
+               spin_unlock_irq(&pps_idr_lock);
                goto kfree_pps;
+       }
 
        id = id & MAX_ID_MASK;
        if (id >= PPS_MAX_SOURCES) {
+               spin_unlock_irq(&pps_idr_lock);
+
                printk(KERN_ERR "pps: %s: too many PPS sources in the
		system\n",
                                        info->name);
                err = -EBUSY;
                goto free_idr;
        }
-
        pps->id = id;
-       pps->params.api_version = PPS_API_VERS;
-       pps->params.mode = default_params;
-       pps->info = *info;
 
-       init_waitqueue_head(&pps->queue);
-       spin_lock_init(&pps->lock);
-       atomic_set(&pps->usage, 1);
+       spin_unlock_irq(&pps_idr_lock);
 
        /* Create the char device */
        err = pps_register_cdev(pps);

> and getting the WRONG pps device.

This is NOT a core problem, it DOES exactly what is expected to do,
that is updating the PPS source 0's event data.

> 
> There is flatly NO WAY to fix that bug unless you can bound the time

Again: this is NOT a bug.

> interval AFTER pps_unregister_source during which pps_event is
> legal.  And that requires the cooperation of the caller.  And once
> you have a bound, what's simpler than a bound of 0?
> 
> Remember what torpedoed every last attempt to make loadable modules
> responsible for their own locking?  On a preemptible kernel, there is no
> bound on the time that may elapse between the call of a function and the
> execution of the first instruction in that function.  It can't increase
> its own reference count, because that's already too late!
> 						
> > Again, forgot the ldisc and any particular implementation. The
> > question is: does your locking scheme provide 100% race free code? You
> > cannot relay on which PPS client a user may implement. Users can be
> > evil! :)
> 
> You can't NOT rely on the PPS client.  The client could scribble on
> memory and crash the machine.  It could pass in timestmp pointers
> to hardware registers with read side effects.  It could do all kinds
> of strange things.
> 
> Even if you assume that all parameters are correct, and only the timing
> can be odd, the client could deallocate the PPS device, wait for the
> module to be unloaded, and then start making pps_event calls.  Kaboom.

The PPS core task is to provide a bug-free API, if the client destroys
the pps data it gets is not a core fault.

> 
> >> Um... okay, style question.  I'm given to adding comments as I'm
> >> puzzling out code, to save the next person the trouble.  In this case,
> >> the important side effect was buried in the middle of an expression,
> >> and I didn't see it the first time.
> >> 
> >> Admittedly, it's the only patch to pps_cdev_open() and not strictly
> >> related, but is it really that much clutter to add a comment here and
> >> there to a patch?  Additional patches are are form of clutter, too,
> >> and the alternative is to discourage explanatory comments.
> >
> > If you wish adding "explanatory comments", please provide a patch
> > titled "explanatory comments". If your patch fixes some bugs or adds
> > new features you may add whatever comments you wish but related with
> > the change you are doing.
> >
> > Apart those stuff, really do you consider "Bump refcount" an
> > "explanatory comment"? ;)
> 
> Yes, for the resons explained in the last sentence of the first paragraph
> above.  "In this case, the important side effect was buried in the middle
> of an expression, and I didn't see it the first time."  Admittedly, it's
> not very important.  It just came up while I was editing and I didn't
> bother to make it a separate commit.
> 
> > > > Please, solve (or explain) the locking issue during
> > > > pps_unregister_source() and pps_event(), other things are cosmetics!
> > > > :)
> > > 
> > > Hopefully, I've addressed that.  Would you like me to expand the
> >
> > I disagree. You explanation relays on ldisc implementation. Please,
> > check the code considering only the core and any possible case of
> > methods call.
> 
> I just demonstrated above that's flatly impossible.  Unless you can
> hold off reallocating a PPS ID until all other API calls have finished
> (or at least acquired the pps_idr_lock), you will have bugs.
> 
> Given that you have to impose *some* bounded time limit between
> pps_unregister_source and other calls, why not make that bound 0?
> 
> That at least makes the PPS code simple; other things make it more
> complicated, and I don't think they help the callers.
> 
> The closest you can get is RCU trickery.
> 
> 
> The pps-ldisc client relies on the guarantees that the ldisc layer
> makes.  Your code does that already.  Suppose that pps_tty_dcd_change
> could overlap with pps_tty_close.  Well, after pps_tty_close comes
> tty_kref_put, which calls release_one_tty, which calls free_tty_struct(tty);
> 
> 
> CPU 0			CPU 1
> 
> pps_tty_dcd_change(tty)
> 			pps_tty_close(tty)
> 			tty_kref_put(tty)
> 			 release_one_tty(tty)
> 			  free_tty_struct(tty)
> 			   kfree(tty)
> Fetch tty->id: SEGFAULT!
> 			
> (This doesn't happen because the code in tty_release_dev that calls
> all of this takes care to ensure that the ldisc is *fully* shut down
> before getting on to tty_kref_put().)

Ok, I don't know all possible solutions, what I know is that
__if_we_want_kernel_inclusion__ we have to provide a PPS core with a
well defined API who does _exactly_ what stated into its documentation
and not related with any particular client implementation.

Ciao,

Rodolfo

-- 

GNU/Linux Solutions                  e-mail: giometti at enneenne.com
Linux Device Driver                          giometti at linux.it
Embedded Systems                     phone:  +39 349 2432127
UNIX programming                     skype:  rodolfo.giometti