Hi Wesley, On Thu, Dec 02, 2021 at 11:33:40PM -0800, Wesley Cheng wrote: > On 12/2/2021 6:49 AM, John Keeping wrote: > > On Wed, Dec 01, 2021 at 04:41:10PM +0000, John Keeping wrote: > >> On Wed, Dec 01, 2021 at 02:02:05AM -0800, Wesley Cheng wrote: > >>> During device disconnect or composition unbind, applications should be > >>> notified that the endpoints are no longer enabled, so that it can take > >>> the proper actions to handle its IO threads. Otherwise, they can be > >>> left waiting for endpoints until EPs are re-enabled. > >>> > >>> Signed-off-by: Wesley Cheng <quic_wcheng@xxxxxxxxxxx> > >>> --- > >>> drivers/usb/gadget/function/f_fs.c | 7 +++++-- > >>> 1 file changed, 5 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/drivers/usb/gadget/function/f_fs.c b/drivers/usb/gadget/function/f_fs.c > >>> index 3c584da9118c..0b0747d96378 100644 > >>> --- a/drivers/usb/gadget/function/f_fs.c > >>> +++ b/drivers/usb/gadget/function/f_fs.c > >>> @@ -957,10 +957,12 @@ static ssize_t ffs_epfile_io(struct file *file, struct ffs_io_data *io_data) > >>> if (file->f_flags & O_NONBLOCK) > >>> return -EAGAIN; > >>> > >>> - ret = wait_event_interruptible( > >>> - epfile->ffs->wait, (ep = epfile->ep)); > >>> + ret = wait_event_interruptible(epfile->ffs->wait, > >>> + (ep = epfile->ep) || !epfile->ffs->func); > > > > I looked at this again, and doesn't this totally break the wait > > condition? > > > > epfile->ep is set to non-null in ffs_func_eps_enable() which is called > > from ffs_func_set_alt() just after ffs->func is set to non-null, and > > then those are also set back to null at the same time. > > > > So the condition boils down to a || !a and this never blocks. Or am I > > missing something? > > Thanks for the feedback. Hmm...yes, I get what you're saying. The > EPfiles and func is basically being set/cleared together, so the above > change wouldn't be any different than checking for ep != epfile->ep. > Let me see if there's another way we can address the issue this change > is trying to resolve. > > > > >>> if (ret) > >>> return -EINTR; > >>> + if (!epfile->ffs->func) > >>> + return -ENODEV; > >> > >> This seems strange - we are inside the case where the endpoint is not > >> initially enabled, if we're returning ENODEV here shouldn't that happen > >> in all cases? > >> > >> Beyond that, there is no locking for accessing ffs->func here; > >> modification happens in gadget callbacks so it's guarded by the gadget > >> core (the existing case in ffs_ep0_ioctl() looks suspicious as well). > >> > >> But I can't see why this change is necessary - there are event > >> notifications through ep0 when this happens, as can be seen in the hunk > >> below from the ffs_event_add(ffs, FUNCTIONFS_DISABLE) line. If > >> userspace cares about this, then it can read the events from ep0. > >> > In short, the change is basically trying to resolve an issue in an > application that has a separate thread handling the IO ops. When the > USB cable is disconnected, the application would expect for this IO > thread to be completed and exit gracefully, and restarting it on the > next connect. However, since we are stuck in the read() it can not > proceed further. Did you see the other recent thread on FunctionFS [1]? It seems that the separate I/O thread is not required and in fact there is a race here in general even using AIO via io_submit(). [1] https://lore.kernel.org/linux-usb/CAJkDvNk4G3WJuitZa+oPuNhkrCPNiwwG-zyNET+urWVWAda+cw@xxxxxxxxxxxxxx/ > I guess in these situations, we should utilize the O_NONBLOCK file > parameter? Yes, using O_NONBLOCK does avoid the problems. I'm not sure what the general solution is - right now I don't see any way to resolve the requirements to wait for the host to connect and handle disconnection without blocking here. The simplest thing would be to refuse epfile I/O when FunctionFS is not enabled, which neatly resolves the race in favour of returning an error. But it means that applications need to wait for a FUNCTIONFS_ENABLE event on ep0 before submitting any transfers on other endpoints, which is a change from the current behaviour. And there's no way to know how many applications rely on that. So I don't think that's an option, at least not without providing some way for userspace to opt in to the new behaviour.