On Tue, 21 Jan 2014, Dan Williams wrote: > > > We want to: > > > > > > 1/ prevent khubd from running while reset is in progress > > > > This is the open question mentioned above, if you add in the > > requirement that we also want to prevent a reset from starting while > > khubd is running (except when khubd performs the reset itself). > > > > > 2/ prevent khubd from seeing an intermediate device state during > > > usb_port_resume > > > > khubd doesn't pay all that much attention to device states; mostly it > > is concerned with port statuses. We don't want khubd to see an > > intermediate port status during usb_port_resume. Basically that means > > we don't want khubd to run while usb_port_resume is in progress, and we > > don't want usb_port_resume to start while khubd is running unless khubd > > performs the resume itself. > > Yes, it does not pay all that much attention to device states, but it is > critical that it not read ->state while usb_port_resume() is changing > it. Specifically this hole in hub_port_connect_change(): It doesn't matter. We don't want khubd and usb_port_resume to run at the same time, for other reasons, and therefore we don't have to worry about khubd seeing an intermediate device state caused by usb_port_resume. It is true, however, that the region of usb_port_resume currently protected by hub->busy_bits isn't big enough. > > > 3/ prevent khubd from running in the interval between completion of > > > ubs_port_runtime_resume() completing and usb_port_resume() running. > > > > Hmmm. I don't remember what usb_port_runtime_resume does about a > > connected USB-2 child device. It can't simply do a runtime resume of > > the child. Maybe it should tell khubd to resume the child? Then khubd > > would never see the intermediate state; the next time it looked at the > > port, it would issue the runtime resume. (Provided that > > usb_port_runtime_resume didn't complete after khubd was already > > running.) > > Yes, that's what I had in the patch, more straightforward than the work > queue. Okay. > > > All these scenarios to point to problems with ->device_state collisions > > > not necessarily the port state. > > > > I don't think so, for the reason mentioned above. > > I'll rephrase, checking USB_STATE_SUSPENDED needs usb_port_resume() > synchronization. Which we need anyway, so the device_state aspect doesn't matter. (You may have overlooked the port-status tests in hub_handle_remote_wakeup.) > I propose we use usb_port pm runtime synchronization to block khubd when > we know the portstatus is not valid (powered off), Suppose, when you power the port back on, you find that the device is no longer plugged in. How do you prevent the port from being powered off again before khubd has a chance to finalize the device? I guess you would have to do a pm_runtime_get on the port whenever the port's bit in hub->change_bits gets set. > and we need a new > lock to synchronize with usb_device pm operations before checking the > device state. Forget about the device state. It's a distraction. > I think we agree that khubd needs to not look at the portstatus while > the port is down. pm runtime synchronization takes care of that and > prevents khubd from starting while the port is inactive. With that in > place we can now make requests in usb_port_runtime_resume() that are > later serviced in khubd when the port is marked active (pm runtime > enforces that ->runtime_resume() return 0 before the port is marked > active). How do you know, when you make the request, that khubd won't start running while the port is still down? If that happened then khubd would know to avoid looking at the port, of course, but the request would be used up and lost. With a lock, this problem doesn't arise. > This mechanism can be used for forcing at least one child > resume for each port resume, and rate-limiting port power requests. What's the connection with rate-limiting? Under what circumstances would you need to rate-limit the port power changes? > For the next case, fixing khubd vs device_state changes, we can't use > pm_runtime synchronization because khubd is expected to be able to > resume a suspended usb_device. So we need to prevent khubd from > evaluating the device_state (USB_STATE_SUSPENDED) while usb_device pm > operations are in flight. In the case of dpm_{suspend|resume} khubd is > frozen so there's no collision, in the case of rpm_{suspend|resume} we > need to take a lock. Agreed. > Let's wrap usb_runtime_{resume|suspend} in a new usb_device->state_lock > and then take that lock in khubd whenever it needs to check ->state or > portstatus values associated with suspend. Specifically: > > @@ -1765,10 +1774,14 @@ int usb_runtime_resume(struct device *dev) > struct usb_device *udev = to_usb_device(dev); > int status; > > - /* Runtime resume for a USB device means resuming both the device > - * and all its interfaces. > + /* Runtime resume for a USB device means resuming both the > + * device and all its interfaces. We lock to synchronize the > + * device state with khubd > */ > + mutex_lock(&udev->state_lock); > status = usb_resume_both(udev, PMSG_AUTO_RESUME); > + mutex_unlock(&udev->state_lock); > + > return status; > } > > The lock hangs off the usb_device rather than the the usb_port since it > is meant to prevent unintended disconnects. In principle, the new lock could go in either the device or the port. IMO it makes more sense to put the lock in the port, because it is meant to protect against port-status collisions. > Any portstatus vs khubd > collisions are handled by pm runtime synchronization since there is no > expectation that khubd resumes a usb_port. What?! I don't understand this at all. The synchronization you were talking about applies to usb_port_runtime_resume; it doesn't protect khubd from colliding with port-status changes in usb_port_resume. That's why we need the new lock. Also, I don't think we should take the lock in usb_runtime_resume. usb_port_resume seems more logical. Once the port is out of the suspended state and the status is all cleaned up, the protection is no longer needed. There's no reason to keep holding the lock while we invoke the resume routines in all the interface drivers. (For that matter, there's no reason to hold the lock at all when resuming a root hub.) It's important to establish the proper order of locking. In an earlier email, I pointed out that we will sometimes need to acquire the new lock while holding the device lock. This remains true. > Once khubd has made a determination that it wants to resume or reset a > device it will of course need to drop the state lock. Those actions > will implicitly take the state lock. Let's call it the status lock, not the state lock, because what it protects is the port status. > Here's the patch (to replace patch 9), now tested. There are of course > cleanups that can follow this, but holding off until we have agreement > on closing these races. You're doing it again! I just told you not to post patches while discussing overall strategy. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html