On Sat, Jul 31, 2010 at 06:24:33PM +0200, Christian Lamparter wrote: > (No maintainer for usb/core?) Yes, did you look in the MAINTAINERS file? Or use scripts/get_maintainer.pl? It should show you the proper name... > First a bit of background: > > I'm developing a driver for the AR9170 USB 802.11n Wireless device. > And if you are interested in this thing, the driver code + fw is available at: > http://www.kernel.org/pub/linux/kernel/people/chr/carl9170/carl9170-1.7.4-pre/ > > Now, back to my problem: > > The driver uses a sophisticated 3 anchor layout: > 1. rx_pool => a pool of prepared urbs, which can be > submitted to the device anytime. Why a pool? Is allocating new ones somehow too slow? If so, why? > 2. rx_work => storage for received urbs, which will be > processed in the rx tasklet. > > 3. rx_anchor => all active rx urbs (i.e.: those which have been > submitted to the device) > > This setup allows the rx urb callback, the rx processing tasklet > routine (and obviously: the initialization code) to restock > rx_anchor, whenever they are running... > > So far so good. I tested this approach and it worked as long > as I was testing it on a single-core machine... > But as soon as I moved to a SMP, I got: > * freezes or Oopses from usb core routines: > usb_poison_anchor_urbs, > usb_kill_anchored_urbs, > usb_get_from_anchor > > * CONFIG_DEBUG_LIST constantly complained > about mysterious link corruptions > > * unexplainable rx data corruptions / loss > > and a long time, I assumed it was caused by a bug in the driver, but > I couldn't find it, so I moved to the usb core and found this: > > 0. struct urb *usb_get_from_anchor(struct usb_anchor *anchor) > 1. { > 2. struct urb *victim; > 3. unsigned long flags; > 4. > 5. spin_lock_irqsave(&anchor->lock, flags); > 6. if (!list_empty(&anchor->urb_list)) { > 7. victim = list_entry(anchor->urb_list.next, struct urb, > 8. anchor_list); > 9. usb_get_urb(victim); > 10. spin_unlock_irqrestore(&anchor->lock, flags); > 11. usb_unanchor_urb(victim); > 12. } else { > 13. spin_unlock_irqrestore(&anchor->lock, flags); > 14. victim = NULL; > 15. } > 16. > 17. return victim; > 18.} > > Take a close look at line 10. and 11. > > We are releasing the anchor->lock, unanchor the urb and return > the "victim". But we never tested if usb_unanchor_urb actually > removed the urb from the anchor, or if "we've lost the race > to another thread --- see comment from usb_unanchor_urb" and > a diffrent usb_get_from_anchor in a elevated context > (or on a different CPU) went through the same loop and > stole the urb (and might have put it into a different anchor already)! > > I've attached a patch which does fix the problem (for me). > Let me know what you guys think, is this a genuine bug, > or if I'm doing this all wrong. > > Regards, > Chr > --- > diff --git a/drivers/usb/core/urb.c b/drivers/usb/core/urb.c > index 7c05555..93f8f4c 100644 > --- a/drivers/usb/core/urb.c > +++ b/drivers/usb/core/urb.c > @@ -137,6 +137,16 @@ void usb_anchor_urb(struct urb *urb, struct usb_anchor *anchor) > } > EXPORT_SYMBOL_GPL(usb_anchor_urb); > > +static void __usb_unanchor_urb(struct urb *urb, struct usb_anchor *anchor) > +{ > + urb->anchor = NULL; > + > + list_del(&urb->anchor_list); > + usb_put_urb(urb); > + if (list_empty(&anchor->urb_list)) > + wake_up(&anchor->wait); You are calling wake_up with a spinlock held, is that ok? thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html