[LinuxPPS] [PATCH 02/11] Streamline interrupt handling; get rid of a lock and idr_find()

Rodolfo Giometti giometti at enneenne.com
Tue Feb 10 17:51:48 CET 2009


On Tue, Feb 10, 2009 at 11:03:29AM -0500, George Spelvin wrote:
> Rodolfo Giometti <giometti at enneenne.com> wrote:
> > No "reasonable contracts" are allowed here, you _must_ provide bugs
> > free code! If pps_unregister_source() should remove a PPS source from
> > the system, it must be do it under any circumstance.
> 
> Could you try to rephrase this?  The grammar is sufficiently confusing that
> I can't figure out what you're trying to say.
> "It must be do it" makes no sense to me.

Sorry, my english is not good. :)

See below. I hope I can explain it better.

> pps_unregister_source will unregister a source as long as it exists.
> Exists is defined as has been registered and has not yet been
> unregistered.
> 
> Once you have unregistered it, it no longer exists; you may no longer
> refer to it in any way.

This is not completely true... if you take a look at older LinuxPPS
code and exchanged e-mails between me and kernel gurus you can see
that in the past pps_event() was called inside the serial port IRQ
handler (using ldisc is just the latest news). This fact (which is
related to that special implementation) was discussed a lot and kernel
gurus asked to me to provide a core code which can get ride of these
particular cases. So it's normal that someone may call
pps_unregister_source() while pps_events() executes, and only the last
one who uses the resource can destroy it is usage drop to 0.

Hope I was clear now.

> 
> >> widely used in multiple bits of the kernel.  the unregister function
> >> frees the pointer, so you shouldn't have anything referencing that
> >> pointer still running.
> >
> > This is exacly what current implementation is trying to do. :)
> >
> > The problem here is that pps_event() can be called while
> > pps_unregister_source() executes and viceversa so your code must
> > prevent crashes due this fact.
> 
> That is not possible.  Period.

That's is _not_ true.

> 
> Example:
> - pps_tty_close() -> pps_unregister_source()
> - Module is unloaded (because the reference count is now 0)
> - pps_dcd_change() -> pps_event()

Again: this is the _current_ implementation but it cannot be the only
one! See the code _before_ ldisc support and you can understand
why. Since ldisc solution is not accepted yet we must provide a
LinuxPPS core that can support concourrency between pps_event() and
pps_register/unregister_source().

> 
> Just what the heck is a module supposed to do about that?
> 
> If you think this kind of abuse is legal, could yiu specify EXACTLY what
> kind of behavior is acceptable and what is not?  For example, here are
> possible ways to abuse the functions exported in <linux/pps.h>:
> 
> struct pps_device *pps_get_source(int source);
> 	- Source values that no longer exist?
> 	- Source values that never did exist?
> extern void pps_put_source(struct pps_device *pps);
> 	- pps pointers that have already been put
> 	- pps pointers that are completely made up
> 	- pps pointers that are unaligned
> 	- pps pointers that are NULL
> 	- pps pointers that are to I/O registers
> extern int pps_register_source(struct pps_source_info *info,
> 				int default_params);
> 	- info->name contains control characters
> 	- info->name not null-terminated
> 	- info->path not null-terminated
> 	- info->echo points to hyperspace
> 	- info->owner invalid
> 	- (info->mode & PPS_CAPTUREBOTH) == 0
> extern void pps_unregister_source(int source);
> 	- Called with a source still in active use
> 	- Called with a source already unregistered
> 	- Called with a completely imaginary source
> extern int pps_register_cdev(struct pps_device *pps);
> 	- Called multiple times on the same pps
> 	- Called after the source is already unregistered
> 	- Called with invalid pps pointers
> extern void pps_unregister_cdev(struct pps_device *pps);
> 	- Called on a pps that was never registered as a cdev
> 	- Called multiple times on the same pps
> 	- Called with invalid pps pointers
> extern void pps_event(int source, struct pps_ktime *ts, int event, void *data);
> 	- Called with source values that no longer exist
> 	- Called with source values that never did exist
> 	- Called with invalid pps_ktime pointers
> 	- Called with pps_ktime values that are non-canonical
> 	  (nsec negative or > 999999999)
> 	- Called with event == 0
> 
> Which of those are legal and which are not?
> 
> >> Kind of like you shouldn't free_pages() until the DMA is complete.
> >> 
> >> I agree that using an index looked up in an IDR makes the code crash-proof
> >> (you can always detect invalidity of the index), but there are ways of
> >> doing that with pointers, too.
> >
> > Good! Let me see how, I don't think your code is doing it.
> 
> No, it's not, because it's slow and annoying and not worth the
> bother.

I don't understand you... =:-o

> 
> >> More to the point, using an index looked up in an IDR does NOT
> >> make the code bug-free.  Consider the following hypothetical code
> >> that calls pps_unregister_source in parallel with pps_event:
> >> 
> >> 
> >> CPU0			CPU1			CPU2
> >> 						call pps_register_source
> >> 						allocate PPS struct
> >> call pps_event		
> >> 			call pps_unregister_source
> >> 			lock IDR
> >> 						lock IDR...(wait)
> >> lock IDR...(wait)
> >> 			idr_find(/dev/pps0)
> >> 			decrement usage to 0
> >> 			remove pps from IDR
> >> 			unlock IDR
> >> 			deallocate PPS
> >> 						... lock acquired
> >> 						Allocate ID 0
> >> 						unlock IDR
> >> ...lock acquired
> >> idr_find(/dev/pps0)
> >> Get unexpected PPS device!
> >> pps->usage++
> >> unlock IDR
> >> Do things to wrong PPS device
> >
> > No, it gets _exactly_ the right device, that is pps0. It's your
> > businness to inform CPU0 on which PPS source ID it must call
> > pss_event(). The core is doing _exactly_ what its documentation
> > states.
> 
> CPU 2 would disagree.  It registered a device, and never sent it any events,
> yet that device is reporting that events have arrived.
> 
> That sure looks like a bug to me.

Not to me, since this is currenlty fully functional.

> 
> If you're allowed to say "it's not my problem because it's documented",
> then why is this okay, and crashing due to a stale pointer not okay?

Because crashes are not recoverable but you can provide a good way to
suggest to pps_event() which source should be updated.

> We can document *that*, instead!

It's already documented.

> 
> Then, just like this, the unwanted behavior is not a bug in the
> PPS core, it's a bug in the code that called it.

Which bug are you talking about? If each clients use the exported
methods everything works well.

> 
> >> Now, the current pps_register_source code actually delays initializing
> >> the pps struct until *after* idr_get_new, so the CPU0 pps_event()
> >> call can crash on an uninitialized structure, but even if that's fixed
> >> (which isn't too hard), you still have the pps_event() calling idr_find
> >
> > Yes, you find a bug! But I can fix it simply initializing the struct
> > before calling idr_get_new(). See the following modifications:
> >
> > diff --git a/drivers/pps/kapi.c b/drivers/pps/kapi.c
> > index 917b379..e0b75ed 100644
> 
> Still buggy.  There is a brief window during which pps->id is not set
> correctly.  (See my patch 03/11 for how to close the window completely.)

Why? It's protected by pps_idr_lock, isn't it? =:-o

> 
> >> and getting the WRONG pps device.
> >
> > This is NOT a core problem, it DOES exactly what is expected to do,
> > that is updating the PPS source 0's event data.
> 
> You're defining it as someone else's prolem.  That doesn't fix the
> problem, which is that a PPS device is getting events that it doesn't
> want.

No, if the client correctly set up PPS IDs.

> 
> If the user of the core does that, buggy behavior (of the whole *system*)
> results.  The user of the core must not do that if they don't want to
> cause bugs.
> 
> >> The pps-ldisc client relies on the guarantees that the ldisc layer
> >> makes.  Your code does that already.  Suppose that pps_tty_dcd_change
> >> could overlap with pps_tty_close.  Well, after pps_tty_close comes
> >> tty_kref_put, which calls release_one_tty, which calls free_tty_struct(tty);
> >> 
> >> CPU 0			CPU 1
> >> 
> >> pps_tty_dcd_change(tty)
> >> 			pps_tty_close(tty)
> >> 			tty_kref_put(tty)
> >> 			 release_one_tty(tty)
> >> 			  free_tty_struct(tty)
> >> 			   kfree(tty)
> >> Fetch tty->id: SEGFAULT!
> >> 			
> >> (This doesn't happen because the code in tty_release_dev that calls
> >> all of this takes care to ensure that the ldisc is *fully* shut down
> >> before getting on to tty_kref_put().)
> >
> > Ok, I don't know all possible solutions, what I know is that
> > __if_we_want_kernel_inclusion__ we have to provide a PPS core with a
> > well defined API who does _exactly_ what stated into its documentation
> > and not related with any particular client implementation.
> 
> So document the fact that any reference to a PPS device after
> pps_unregister_device is a bug that will make demons fly out of your
> nose.
> ("Nasal demons" are from http://www.catb.org/jargon/html/N/nasal-demons.html)
> 
> It will be just like the thousand other acquire/release primitives inside
> the kernel.  I could start listing them, if you'd like:
> - kmalloc/kfree
> - fget/fput
> - get_unused_fd/put_unused_fd
> - get_page/put_page
> - device_create/device_destroy
> - tty_kref_get/tty_kref_put
> - tty_ldisc_ref/tty_ldisc_deref
> 
> I could go on for quite a while.  In every single case, you have to not
> have any operations im progress on the released object when you release
> it, or boom.

LinuxPPS's pps_register_source(), pps_unregister_source() and
pps_event() have different features due PPS nature, that is
pps_event() __can_be_called_while_the_others_execute__.

> 
> That's the way the kernel works.  That's the way most software works,
> in the absence of garbage collection.
> 
> Linux doesn't believe in slowing the system down to catch every
> conceivable kernel programmer error.

Currently LinuxPPS implementation needs the above requirement. After
kernel inclusion and _____after_____ serial _____and_____ parallel
client inclusion we can discuss about relaxing some requirements due
specific clients implementation.

If you can provide better/faster/cleaner code that has the above
requirement I'll be happy to accept it, otherwise I'm sorry but I have
to refuse it.

Ciao,

Rodolfo

-- 

GNU/Linux Solutions                  e-mail: giometti at enneenne.com
Linux Device Driver                          giometti at linux.it
Embedded Systems                     phone:  +39 349 2432127
UNIX programming                     skype:  rodolfo.giometti



More information about the LinuxPPS mailing list