On Tue, 28 Apr 2009, David Brownell wrote: > On Thursday 23 April 2009, Alan Stern wrote: > > Dave: > > > > A bug in gadgetfs has persisted for quite a while: nested spin_locks > > when used with net2280. > > Did you just happen to notice this when trying the latest > PTP code? :) As a matter of fact, I first noticed it quite a long time ago, long enough that I can't remember exactly when. But recent bug reports and in particular the PTP stuff jogged my memory. > > This is because inode.c calls usb_ep_queue() > > for ep0 while holding the device lock, and ep0_complete() tries to > > acquire the lock. Since net2280 will call the completion routine > > directly during submission if the request is small enough to fit > > entirely in a FIFO, we end up in deadlock. (Dummy_hcd may behave the > > same way; I haven't checked.) > > Yeah, locking there has always been pretty nasty. > > There were a surprising number of code paths that > could tromp on each other ... some of them may now > have vanished, since the ability to implement a > userspace gadget without ioctls has been removed. The degree of complication was sufficiently high that I never did fully grok that driver. Which is why I'm asking for help now... > Note that your subject is misleading. There's > only *one* spinlock ... :) You're right. The subject line should have read "Gadgetfs's use of spin_lock()". :-) > > The patch below attempts to fix the problem by dropping the lock around > > the calls to usb_ep_queue() for ep0, but I don't know whether doing > > this might lead to other problems. > > The comment in the "struct ep_data" definition is that > the spinlock must be held to access ep or req. > > Now, digging back into the foggier parts of my memory, > the really nasty cases there involved cleanup paths. > > As in, close file descriptor ==> gadgetfs_unbind(), > unplug cable from host ==> gadgetfs_disconnect(), and > both of those could happen concurrently with respect > to request completion or submission... yep, three > concurrent nastinesses to defend against. Luckily > it was easy enough to test each singly, and along > with any one of the others. "Close file descriptor" is the only unusual event; unplug, submission, and completion occur with every gadget driver. Does that make it any easier to think about? Probably not... > A better solution than that spinlock may be needed. > Maybe a parallel state machine. Or refactoring the ep0 completion handler, moving more of the work into the submitter thread. > > I suppose the places inside > > gadgetfs_setup() are probably okay, but what about the call inside > > ep0_read()? There's a possibility that the host might send a new > > SETUP packet while waiting for the userspace program to get around to > > reading the OUT data from the previous transfer. > > Yeah, ep0 in particular has additional nasty cases that > none of the other endpoints had. Like those. > > > > Can you look this over and figure out what more (if anything) needs to > > be done? > > It'll hurt, but ... I'll try. Sorry for dumping this in your lap... At least it's a short patch! :-) Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html