On Fri, Sep 13, 2013 at 8:11 PM, Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote: > On Fri, 13 Sep 2013, Bjorn Helgaas wrote: > >> Thanks. Maybe this is more relevant than I thought. I'd sure like to >> copy your strategy rather than reinvent something. > > Well, I don't know if this will really end up being all that relevant > for PCI, but you're welcome to steal any ideas you like. :-) > >> > Very briefly, usb_device_read() dereferences bus->root_hub: >> > >> > usb_lock_device(bus->root_hub); >> > >> > On the other hand, usb_remove_hcd() calls both >> > >> > usb_disconnect(&rhdev); >> > >> > and >> > >> > usb_put_dev(hcd->self.root_hub); >> > >> > (note that rhdev == hcd->self.root_hub == bus->root_hub), which >> > unregisters and deallocates the root_hub structure. Something has to >> > prevent usb_device_read() from accessing the structure after it has >> > been unregistered and freed; that's where usb_bus_list_lock comes in. >> >> I hadn't paid attention to hcd->rh_registered before, but that seems >> important. When iterating over usb_bus_list, usb_device_read() >> ignores buses with hcd->rh_registered == 0. > > Right. In theory this could happen either because the root hub hasn't > been registered yet or it has already been (or is about to be) > unregistered. > >> usb_remove_hcd() clears hcd->rh_registered before deallocating the >> root_hub with usb_put_dev(). I think the mutex_lock/unlock between >> clearing rh_registered and deallocating root_hub will ensure that >> usb_device_read() cannot see rh_registered == 1 and a deallocated >> root_hub structure. > > Exactly. You've got it. > >> But that seems more like a memory barrier issue >> than a locking issue, and I don't yet see how usb_disconnect() is >> related. > > It's more than just a memory barrier. Suppose usb_device_read() sees > that rh_registered is set. Then we don't want usb_remove_hcd() to > deallocate the root hub until usb_device_read() is finished using it, > and the way to prevent that is by making usb_device_read() hold > usb_bus_list_lock somewhere in between setting rh_registered back to 0 > and the final usb_device_put(). This gets to the reason I'm interested -- we have PCI device removal issues, and I have been thinking of them largely as reference counting problems. I'd like to make the assertion that if you have a pci_bus or a pci_dev pointer, e.g., acquired by traversing a list similar to usb_bus_list, you can assume the whole structure is valid, including things it contains or points to. But here you have the additional rh_registered flag, which basically tells you whether bus->root_hub is valid. My naive question is why you don't wait to do the usb_device_put(hcd->self.root_hub) until after you remove the bus from usb_bus_list. If you could do that, usb_device_read() would still need the lock, like all other readers of usb_bus_list, but it wouldn't need to look at rh_registered. > The fact that this "somewhere" surrounds a call to usb_disconnect() is > more or less irrelevant. It may have been more important in earlier > kernel versions (i.e., before hcd->rh_registered was invented), I don't > remember. In theory, it would now be okay to move the usb_disconnect() > call after the locked region, but it would look kind of strange to > acquire the mutex and immediately release it without doing anything in > between. I guess the next best thing would be to hold > usb_bus_list_lock instead across these lines: > > spin_lock_irq (&hcd_root_hub_lock); > hcd->rh_registered = 0; > spin_unlock_irq (&hcd_root_hub_lock); > > That at least would make sense, and it would point out that > hcd->rh_registered really is protected by both the mutex and the > spinlock. That would make a lot more sense to me. The current code gives a strong hint that the lock is related to something done in usb_disconnect(), but that's really a pointer in the wrong direction. >> I guess I'm hung up because I don't know how managing the set of root >> hubs is different from managing usb_bus_list. Maybe it has something >> to do with the device_add() call in register_root_hub() and the >> device_del() call in usb_disconnect(). We hold usb_bus_list_lock in >> both cases, but neither does anything with usb_bus_list. Or maybe it >> has something to do with the visibility of stores to >> hcd->rh_registered? > > Basically yes. We need to prevent writes to hcd->rh_registered in both > interrupt context (for example, in usb_hc_died()) and process context > (usb_device_read()). Therefore it has to be write-protected by both a > spinlock and a mutex: hcd_root_hub_lock and usb_bus_list_lock. > > The bus list, by contrast, is used only in process context. Therefore > the mutex suffices to protect it. > > Does this all make sense? I realize it's rather obscure, and it > doesn't help that the usages changed after the names were chosen. It does make sense, and I appreciate your patience. Thanks! Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html