On Fri, Apr 19, 2019 at 01:52:38PM -0400, Alan Stern wrote: > The syzkaller fuzzer reported a bug in the USB hub driver which turned > out to be caused by a negative runtime-PM usage counter. This allowed > a hub to be runtime suspended at a time when the driver did not expect > it. The symptom is a WARNING issued because the hub's status URB is > submitted while it is already active: > > URB 0000000031fb463e submitted while active > WARNING: CPU: 0 PID: 2917 at drivers/usb/core/urb.c:363 > > The negative runtime-PM usage count was caused by an unfortunate > design decision made when runtime PM was first implemented for USB. > At that time, USB class drivers were allowed to unbind from their > interfaces without balancing the usage counter (i.e., leaving it with > a positive count). The core code would take care of setting the > counter back to 0 before allowing another driver to bind to the > interface. > > Later on when runtime PM was implemented for the entire kernel, the > opposite decision was made: Drivers were required to balance their > runtime-PM get and put calls. In order to maintain backward > compatibility, however, the USB subsystem adapted to the new > implementation by keeping an independent usage counter for each > interface and using it to automatically adjust the normal usage > counter back to 0 whenever a driver was unbound. > > This approach involves duplicating information, but what is worse, it > doesn't work properly in cases where a USB class driver delays > decrementing the usage counter until after the driver's disconnect() > routine has returned and the counter has been adjusted back to 0. > Doing so would cause the usage counter to become negative. There's > even a warning about this in the USB power management documentation! > > As it happens, this is exactly what the hub driver does. The > kick_hub_wq() routine increments the runtime-PM usage counter, and the > corresponding decrement is carried out by hub_event() in the context > of the hub_wq work-queue thread. This work routine may sometimes run > after the driver has been unbound from its interface, and when it does > it causes the usage counter to go negative. > > It is not possible for hub_disconnect() to wait for a pending > hub_event() call to finish, because hub_disconnect() is called with > the device lock held and hub_event() acquires that lock. The only > feasible fix is to reverse the original design decision: remove the > duplicate interface-specific usage counter and require USB drivers to > balance their runtime PM gets and puts. As far as I know, all > existing drivers currently do this. Nice work, that took a lot of debugging, thanks for resolving this. greg k-h