On Mon, 2019-06-24 at 13:48 -0400, Alan Stern wrote: > On Mon, 24 Jun 2019, Mayuresh Kulkarni wrote: > > > > > On Fri, 2019-06-21 at 15:28 -0400, Alan Stern wrote: > > > > > > On Fri, 21 Jun 2019, Mayuresh Kulkarni wrote: > > > > > > > > > > > > > > > Hi Alan, > > > > > > > > With the suggested modification (of having suspend/resume of > > > > usbfs > > > > at > > > > device level instead of interface level), looks like I am seeing > > > > a > > > > deadlock described as below - > > > > > > > > Pre-condition: USB device is connected but suspended before > > > > running > > > > test > > > > program. > > > > > > > > 1. The test program calls open(/dev/bus/usb/...). > > > > 2. This ends up calling usbdev_open(). > > > > 3. usbdev_open() takes the lock and > > > > calls usb_autoresume_device(). > > > > 4. usb_autoresume_device() calls pm_runtime_get_sync(). Due to > > > > _sync > > > > version, this call will return after calling the resume call- > > > > back of > > > > driver (please correct me if wrong). > > > > 5. This ends up calling generic_resume() which > > > > calls usbfs_notify_resume(). > > > > 6. Now usbfs_notify_resume() also wants the same lock that > > > > usbdev_open() > > > > in (3) has already taken. > > > What lock are you talking about? usbfs_notify_resume() doesn't > > > take > > > any locks. > > The lock I am talking about is used > > by usb_lock_device/usb_unlock_device. It is shared > > between usbdev_open() > > and usbfs_notify_resume(). > Here's the subroutine as it appears in the patch: > > void usbfs_notify_resume(struct usb_device *udev) > { > struct usb_dev_state *ps; > > list_for_each_entry(ps, &udev->filelist, list) { > WRITE_ONCE(ps->not_yet_resumed, 0); > wake_up_all(&ps->wait_for_resume); > } > } > > Please point out exactly where in this code you think > usbfs_notify_resume() tries to acquire the device lock. > Oops, sorry. Please accept my apology on this one. You are right, the original patch you send doesn't have the lock in usbfs_notify_suspend/resume. > (Although, to be fair, I think udev->filelist needs to be protected by > usbfs_mutex. I will add this in the next version of the patch.) OK. > > > > > I think for this approach, we would also need to > > call usb_autosuspend_device/usb_autoresume_device out of the > > usb_lock/unlock_device pair. > No, you're not allowed to do that. See the documentation at the top > of the source code for usb_autosuspend_device() and > usb_autoresume_device() in driver.c. > > > > > > > > > > > > > > My observation so far is - when the USB device is connected for > > > > first > > > > time, Android's USB manager server is able to open the device > > > > and > > > > reads > > > > its descriptors using usbfs. But the test is not able to. The > > > > change > > > > is > > > > auto-suspend in between device connect and run of test program. > > > > > > > > I am still analysing the situation here to see if pre-condition > > > > above > > > > really makes difference or not. So please take this update with > > > > pinch of > > > > salt. However, still I wanted send this update and get a quick > > > > review to > > > > ensure I am not wandering in weeds. > > > This doesn't sound like a real problem. > > > > > > However, I have been thinking about how to do all this in light of > > > Oliver's comments, and it seems like we should make some changes. > > > > > > First, let's change the ioctls' names. Instead of RESUME and > > > SUSPEND, > > > I'll call them FORBID_SUSPEND and ALLOW_SUSPEND. The way they > > > work > > > should be clear: ALLOW_SUSPEND will permit the device to be > > > suspended > > > but might not cause a suspend to happen > > > immediately. FORBID_SUSPEND > > > will cause an immediate resume if the device is currently > > > suspended > > > and > > > will prevent the device from being suspended in the future. The > > > new > > > names more accurately reflect what the ioctls actually do. > > > > > > Second, the WAIT_FOR_RESUME ioctl will wait only until a resume > > > has > > > occurred more recently than the most recent ALLOW_SUSPEND > > > ioctl. So > > > for example, if the program calls ALLOW_SUSPEND, and the device > > > suspends, and then the device resumes, and then the device > > > suspends > > > again, and then the program calls WAIT_FOR_RESUME, the ioctl will > > > return immediately even though the device is currently > > > suspended. > > > Thus you don't have to worry about missing a remote resume. This > > > also > > > means that when WAIT_FOR_RESUME returns, the program should call > > > FORBID_SUSPEND to ensure that the device is active and doesn't go > > > back > > > into suspend. > > > > > 1. For the above example, since resume can happen anytime, the user- > > space's suspend and resume notion would get out-of-sync with actual > > device state, right? > That's right. We can't notify userspace about every state transition > the device makes, so userspace may very well get out-of-sync with the > actual device state. > > > > > > > > > In practice, your program would use the ioctls as follows: > > > > > > When the device file is opened, the kernel will automatically > > > do the equivalent of FORBID_SUSPEND (to remain compatible > > > with the current behavior). > > > > > > When the program is ready for the device to suspend, it will > > > call the ALLOW_SUSPEND ioctl. But it won't cancel the > > > outstanding URBs; instead it will continue to interact > > > normally with the device, because the device might not be > > > suspended for some time. > > > > > I think it is reasonable to expect that user-space will call > > ALLOW_SUSPEND when there is no URB pending from its side. If that is > > not > > the case, how can it expect the device to suspend when it has some > > outstanding work pending? > There are two possible ways a userspace program can monitor the > device's state: > > 1. The program can leave an URB (typically on an interrupt > endpoint) running constantly, and the device can send its > response to the URB whenever something happens. > > 2. The program can poll the device by submitting an URB > periodically to see if anything has happened since the last > poll. > > In case 1, the program would leave the URB running even after doing > the > ALLOW_SUSPEND ioctl. That way the program will be aware of anything > that happens to the device before it suspends. When the device does > go > into suspend, the URB will be terminated. > > In case 2, the program would continue polling the device even after > doing the ALLOW_SUSPEND ioctl. When the device does go into suspend, > the polling URB will fail. > Right, so user space should do the following when it determines the device is idle from its point of view - 1. Call ALLOW_SUSPEND ioctl 2. Queue an URB and wait for its REAP. When the wait returns -EFAIL (or something similar), that is the indication that the device is no longer active (or suspended) 3. Call WAIT_FOR_RESUME ioctl 4. When WAIT_FOR_RESUME ioctl returns, it is guaranteed that device is active. 5. Call FORBID_SUSPEND ioctl and read the cause of resume. 6. Go to (1) when appropriate Have I summarized this approach correctly from user-space point of view? > If the program doesn't do either of these (that is, if it leaves the > device completely idle) then it won't know if the user presses a > button or does anything else to the device before the device is put > into suspend. > > > > > > > > > When the device does go into suspend, URBs will start failing > > > with an appropriate error code (EHOSTUNREACH, ESHUTDOWN, > > > EPROTO, or something similar). In this way the program will > > > realize the device is suspended. At that point the program > > > should call the WAIT_FOR_RESUME ioctl and stop trying to > > > communicate with the device. > > > > > > When WAIT_FOR_RESUME returns, the program should call the > > > FORBID_SUSPEND ioctl and resume normal communication with the > > > device. > > > > > Due to this condition, the example in (1) above will not cause a > > wait on > > current resume operation, since after WAIT_FOR_RESUME returns, the > > user- > > space will call FORBID_SUSPEND, which will bring device out of > > current > > suspend state. So, the wait effectively didn't happen, right? > Correct. > > > > > > > > > How does that sound? > > My concerns are mentioned above, hope they make sense. > > I will give this approach a shot with my test program and see how > > that > > goes. > > > > Just to re-iterate, the main issue we have now is - how to notify > > user- > > space of suspend, so that it can safely call wait-for-resume, right? > Basically there is no way to notify userspace when the device is > suspended. Instead, we have to assume that the userspace program > will > figure out what has happened all by itself when its URBs begin > failing. > Based on discussion so far and my understanding, how about below approach - 1. Have ALLOW_SUSPEND and WAIT_FOR_RESUME ioctls. As before, ALLOW_SUSPEND calls usb_autosuspend_device() while WAIT_FOR_RESUME waits for resume. 2. Have usbfs_notify_suspend() and usbfs_notify_resume() as per your previous patch (i.e.: system/resume callbacks at device level). 3. Extend usbdev_poll() to wait for udev->state == USB_STATE_SUSPENDED when events == POLLPRI. Return POLLPRI when state = USB_STATE_SUSPENDED. 4. As before, any ioctl != (ALLOW_SUSPEND or WAIT_FOR_RESUME) calls usb_autoresume_device(). The corresponding user-space calls would be - A. When determined device is idle, call ALLOW_SUSPEND ioctl. B. Call poll(device_fd, POLLPRI). When poll returns check revents == POLLPRI. C. Call WAIT_FOR_RESUME ioctl. D. When WAIT_FOR_RESUME ioctl returns, read resume reason. E. Go to (A). The constraint on (1) above is - ALLOW_SUSPEND should be called when user-space decides device is idle. I think this is not a hard constraint since poll() for suspend will return POLLPRI when device is suspended which is different from what it returns when ASYNC URB is completed. Few points I am unsure of are - 1. Is it OK for a driver to re-purpose POLLPRI for its own use or POLLPRI has a unique meaning system wide? 2. Is it safe to wait for udev->state to be of a particular value? If this approach could work, I can spend time on this one as well. Thanks, > If necessary, we could add an extra ioctl (IS_SUSPENDED) which would > return 0 or 1 depending on whether the device is active or > suspended. > But I don't think we will need this. > > Alan Stern >