Re: [PATCH] usb: gadget: f_fs: Wake up IO thread during disconnect

John Keeping <john@xxxxxxxxxxxx> · Fri, 3 Dec 2021 11:30:50 +0000

Hi Wesley,

On Thu, Dec 02, 2021 at 11:33:40PM -0800, Wesley Cheng wrote:
> On 12/2/2021 6:49 AM, John Keeping wrote:
> > On Wed, Dec 01, 2021 at 04:41:10PM +0000, John Keeping wrote:
> >> On Wed, Dec 01, 2021 at 02:02:05AM -0800, Wesley Cheng wrote:
> >>> During device disconnect or composition unbind, applications should be
> >>> notified that the endpoints are no longer enabled, so that it can take
> >>> the proper actions to handle its IO threads.  Otherwise, they can be
> >>> left waiting for endpoints until EPs are re-enabled.
> >>>
> >>> Signed-off-by: Wesley Cheng <quic_wcheng@xxxxxxxxxxx>
> >>> ---
> >>>  drivers/usb/gadget/function/f_fs.c | 7 +++++--
> >>>  1 file changed, 5 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/usb/gadget/function/f_fs.c b/drivers/usb/gadget/function/f_fs.c
> >>> index 3c584da9118c..0b0747d96378 100644
> >>> --- a/drivers/usb/gadget/function/f_fs.c
> >>> +++ b/drivers/usb/gadget/function/f_fs.c
> >>> @@ -957,10 +957,12 @@ static ssize_t ffs_epfile_io(struct file *file, struct ffs_io_data *io_data)
> >>>  		if (file->f_flags & O_NONBLOCK)
> >>>  			return -EAGAIN;
> >>>  
> >>> -		ret = wait_event_interruptible(
> >>> -				epfile->ffs->wait, (ep = epfile->ep));
> >>> +		ret = wait_event_interruptible(epfile->ffs->wait,
> >>> +				(ep = epfile->ep) || !epfile->ffs->func);
> > 
> > I looked at this again, and doesn't this totally break the wait
> > condition?
> > 
> > epfile->ep is set to non-null in ffs_func_eps_enable() which is called
> > from ffs_func_set_alt() just after ffs->func is set to non-null, and
> > then those are also set back to null at the same time.
> > 
> > So the condition boils down to a || !a and this never blocks.  Or am I
> > missing something?
> 
> Thanks for the feedback.  Hmm...yes, I get what you're saying.  The
> EPfiles and func is basically being set/cleared together, so the above
> change wouldn't be any different than checking for ep != epfile->ep.
> Let me see if there's another way we can address the issue this change
> is trying to resolve.
> 
> > 
> >>>  		if (ret)
> >>>  			return -EINTR;
> >>> +		if (!epfile->ffs->func)
> >>> +			return -ENODEV;
> >>
> >> This seems strange - we are inside the case where the endpoint is not
> >> initially enabled, if we're returning ENODEV here shouldn't that happen
> >> in all cases?
> >>
> >> Beyond that, there is no locking for accessing ffs->func here;
> >> modification happens in gadget callbacks so it's guarded by the gadget
> >> core (the existing case in ffs_ep0_ioctl() looks suspicious as well).
> >>
> >> But I can't see why this change is necessary - there are event
> >> notifications through ep0 when this happens, as can be seen in the hunk
> >> below from the ffs_event_add(ffs, FUNCTIONFS_DISABLE) line.  If
> >> userspace cares about this, then it can read the events from ep0.
> >>
> In short, the change is basically trying to resolve an issue in an
> application that has a separate thread handling the IO ops.  When the
> USB cable is disconnected, the application would expect for this IO
> thread to be completed and exit gracefully, and restarting it on the
> next connect.  However, since we are stuck in the read() it can not
> proceed further.

Did you see the other recent thread on FunctionFS [1]?  It seems that
the separate I/O thread is not required and in fact there is a race here
in general even using AIO via io_submit().

[1] https://lore.kernel.org/linux-usb/CAJkDvNk4G3WJuitZa+oPuNhkrCPNiwwG-zyNET+urWVWAda+cw@xxxxxxxxxxxxxx/

> I guess in these situations, we should utilize the O_NONBLOCK file
> parameter?

Yes, using O_NONBLOCK does avoid the problems.

I'm not sure what the general solution is - right now I don't see any
way to resolve the requirements to wait for the host to connect and
handle disconnection without blocking here.

The simplest thing would be to refuse epfile I/O when FunctionFS is not
enabled, which neatly resolves the race in favour of returning an error.
But it means that applications need to wait for a FUNCTIONFS_ENABLE
event on ep0 before submitting any transfers on other endpoints, which
is a change from the current behaviour.  And there's no way to know how
many applications rely on that.

So I don't think that's an option, at least not without providing some
way for userspace to opt in to the new behaviour.