On Fri, 13 Sep 2013, Bjorn Helgaas wrote: > Thanks. Maybe this is more relevant than I thought. I'd sure like to > copy your strategy rather than reinvent something. Well, I don't know if this will really end up being all that relevant for PCI, but you're welcome to steal any ideas you like. :-) > > Very briefly, usb_device_read() dereferences bus->root_hub: > > > > usb_lock_device(bus->root_hub); > > > > On the other hand, usb_remove_hcd() calls both > > > > usb_disconnect(&rhdev); > > > > and > > > > usb_put_dev(hcd->self.root_hub); > > > > (note that rhdev == hcd->self.root_hub == bus->root_hub), which > > unregisters and deallocates the root_hub structure. Something has to > > prevent usb_device_read() from accessing the structure after it has > > been unregistered and freed; that's where usb_bus_list_lock comes in. > > I hadn't paid attention to hcd->rh_registered before, but that seems > important. When iterating over usb_bus_list, usb_device_read() > ignores buses with hcd->rh_registered == 0. Right. In theory this could happen either because the root hub hasn't been registered yet or it has already been (or is about to be) unregistered. > usb_remove_hcd() clears hcd->rh_registered before deallocating the > root_hub with usb_put_dev(). I think the mutex_lock/unlock between > clearing rh_registered and deallocating root_hub will ensure that > usb_device_read() cannot see rh_registered == 1 and a deallocated > root_hub structure. Exactly. You've got it. > But that seems more like a memory barrier issue > than a locking issue, and I don't yet see how usb_disconnect() is > related. It's more than just a memory barrier. Suppose usb_device_read() sees that rh_registered is set. Then we don't want usb_remove_hcd() to deallocate the root hub until usb_device_read() is finished using it, and the way to prevent that is by making usb_device_read() hold usb_bus_list_lock somewhere in between setting rh_registered back to 0 and the final usb_device_put(). The fact that this "somewhere" surrounds a call to usb_disconnect() is more or less irrelevant. It may have been more important in earlier kernel versions (i.e., before hcd->rh_registered was invented), I don't remember. In theory, it would now be okay to move the usb_disconnect() call after the locked region, but it would look kind of strange to acquire the mutex and immediately release it without doing anything in between. I guess the next best thing would be to hold usb_bus_list_lock instead across these lines: spin_lock_irq (&hcd_root_hub_lock); hcd->rh_registered = 0; spin_unlock_irq (&hcd_root_hub_lock); That at least would make sense, and it would point out that hcd->rh_registered really is protected by both the mutex and the spinlock. > I guess I'm hung up because I don't know how managing the set of root > hubs is different from managing usb_bus_list. Maybe it has something > to do with the device_add() call in register_root_hub() and the > device_del() call in usb_disconnect(). We hold usb_bus_list_lock in > both cases, but neither does anything with usb_bus_list. Or maybe it > has something to do with the visibility of stores to > hcd->rh_registered? Basically yes. We need to prevent writes to hcd->rh_registered in both interrupt context (for example, in usb_hc_died()) and process context (usb_device_read()). Therefore it has to be write-protected by both a spinlock and a mutex: hcd_root_hub_lock and usb_bus_list_lock. The bus list, by contrast, is used only in process context. Therefore the mutex suffices to protect it. Does this all make sense? I realize it's rather obscure, and it doesn't help that the usages changed after the names were chosen. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html