On Sat, 31 Jul 2010, Christian Lamparter wrote: > (No maintainer for usb/core?) The core is maintained by multiple people. This mailing list is the correct place for bug reports concerning usbcore. > Hello, > > First a bit of background: > > I'm developing a driver for the AR9170 USB 802.11n Wireless device. > And if you are interested in this thing, the driver code + fw is available at: > http://www.kernel.org/pub/linux/kernel/people/chr/carl9170/carl9170-1.7.4-pre/ > > Now, back to my problem: > > The driver uses a sophisticated 3 anchor layout: > 1. rx_pool => a pool of prepared urbs, which can be > submitted to the device anytime. > > 2. rx_work => storage for received urbs, which will be > processed in the rx tasklet. > > 3. rx_anchor => all active rx urbs (i.e.: those which have been > submitted to the device) > > This setup allows the rx urb callback, the rx processing tasklet > routine (and obviously: the initialization code) to restock > rx_anchor, whenever they are running... > > So far so good. I tested this approach and it worked as long > as I was testing it on a single-core machine... > But as soon as I moved to a SMP, I got: > * freezes or Oopses from usb core routines: > usb_poison_anchor_urbs, > usb_kill_anchored_urbs, > usb_get_from_anchor > > * CONFIG_DEBUG_LIST constantly complained > about mysterious link corruptions > > * unexplainable rx data corruptions / loss > > and a long time, I assumed it was caused by a bug in the driver, but > I couldn't find it, so I moved to the usb core and found this: > > 0. struct urb *usb_get_from_anchor(struct usb_anchor *anchor) > 1. { > 2. struct urb *victim; > 3. unsigned long flags; > 4. > 5. spin_lock_irqsave(&anchor->lock, flags); > 6. if (!list_empty(&anchor->urb_list)) { > 7. victim = list_entry(anchor->urb_list.next, struct urb, > 8. anchor_list); > 9. usb_get_urb(victim); > 10. spin_unlock_irqrestore(&anchor->lock, flags); > 11. usb_unanchor_urb(victim); > 12. } else { > 13. spin_unlock_irqrestore(&anchor->lock, flags); > 14. victim = NULL; > 15. } > 16. > 17. return victim; > 18.} > > Take a close look at line 10. and 11. > > We are releasing the anchor->lock, unanchor the urb and return > the "victim". But we never tested if usb_unanchor_urb actually > removed the urb from the anchor, or if "we've lost the race > to another thread --- see comment from usb_unanchor_urb" and > a diffrent usb_get_from_anchor in a elevated context > (or on a different CPU) went through the same loop and > stole the urb (and might have put it into a different anchor already)! > > I've attached a patch which does fix the problem (for me). > Let me know what you guys think, is this a genuine bug, > or if I'm doing this all wrong. Yes, this is definitely a bug. Your solution looks basically okay. A few comments are below. > Regards, > Chr > --- > diff --git a/drivers/usb/core/urb.c b/drivers/usb/core/urb.c > index 7c05555..93f8f4c 100644 > --- a/drivers/usb/core/urb.c > +++ b/drivers/usb/core/urb.c > @@ -137,6 +137,16 @@ void usb_anchor_urb(struct urb *urb, struct usb_anchor *anchor) > } > EXPORT_SYMBOL_GPL(usb_anchor_urb); > > +static void __usb_unanchor_urb(struct urb *urb, struct usb_anchor *anchor) > +{ > + urb->anchor = NULL; > + Unnecessary blank line. > + list_del(&urb->anchor_list); > + usb_put_urb(urb); > + if (list_empty(&anchor->urb_list)) > + wake_up(&anchor->wait); > +} Although it may be slightly less efficient to call usb_put_urb() and wake_up() while still holding the spinlock, I think the difference won't matter. > + > /** > * usb_unanchor_urb - unanchors an URB > * @urb: pointer to the urb to anchor > @@ -161,12 +171,9 @@ void usb_unanchor_urb(struct urb *urb) > spin_unlock_irqrestore(&anchor->lock, flags); > return; > } > - urb->anchor = NULL; > - list_del(&urb->anchor_list); > + > + __usb_unanchor_urb(urb, anchor); > spin_unlock_irqrestore(&anchor->lock, flags); > - usb_put_urb(urb); > - if (list_empty(&anchor->urb_list)) > - wake_up(&anchor->wait); > } > EXPORT_SYMBOL_GPL(usb_unanchor_urb); Here you can rearrange the logic so that it makes more sense: ... spin_lock_irqsave(&anchor->lock, flags) if (likely(anchor == urb->anchor)) __usb_unanchor_urb(urb); spin_unlock_irqrestore(&anchor->lock, flags); } > @@ -799,8 +806,8 @@ struct urb *usb_get_from_anchor(struct usb_anchor *anchor) > victim = list_entry(anchor->urb_list.next, struct urb, > anchor_list); > usb_get_urb(victim); > + __usb_unanchor_urb(victim, anchor); > spin_unlock_irqrestore(&anchor->lock, flags); > - usb_unanchor_urb(victim); > } else { > spin_unlock_irqrestore(&anchor->lock, flags); > victim = NULL; Here you can hoist the spin_unlock_irqrestore calls out of the "if" statement. > @@ -827,9 +834,9 @@ void usb_scuttle_anchored_urbs(struct usb_anchor *anchor) > victim = list_entry(anchor->urb_list.prev, struct urb, > anchor_list); > usb_get_urb(victim); > + __usb_unanchor_urb(victim, anchor); > spin_unlock_irqrestore(&anchor->lock, flags); > /* this may free the URB */ > - usb_unanchor_urb(victim); > usb_put_urb(victim); > spin_lock_irqsave(&anchor->lock, flags); > } This routine no longer needs the usb_{get,put}_urb calls or the internal spinlock manipulations. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html