Re: [PATCH v2] xhci: dbc: fix handling ClearFeature Halt requests

Łukasz Bartosik <ukaszb@xxxxxxxxxxxx> · Tue, 30 Jul 2024 02:17:15 +0200

On Mon, Jul 29, 2024 at 4:11 PM Mathias Nyman
<mathias.nyman@xxxxxxxxxxxxxxx> wrote:
>
> Hi
>
> On 25.7.2024 10.48, Łukasz Bartosik wrote:
> > DbC driver does not handle ClearFeature Halt requests correctly
> > which in turn may lead to dropping packets on the receive path.
>
> Nice catch.
> Looks like a halted endpoint is treated almost as a disconnect.
>
> >
> > Below is a trace capture where due to incorrect handling of
> > ClearFeature Halt packet gets dropped on the receive path.
> >
> > /sys/kernel/debug/tracing # cat trace
> > 1) kworker/10:3-514   [010] d..1.  2925.601843: xhci_dbc_handle_event:
> >       EVENT: TRB 000000019588c0e0 status 'Stall Error' len 0 slot 1 ep 2
> >       type 'Transfer Event' flags e:C
> >
> > 2) kworker/10:3-514   [010] d..1.  2925.613285: xhci_dbc_handle_event:
> >       EVENT: TRB 000000019588bc80 status 'Stall Error' len 1024 slot 1
> >       ep 3 type 'Transfer Event' flags e:C
> >
> > 3) kworker/10:3-514   [010] d..1.  2925.619024: xhci_dbc_handle_transfer:
> >       BULK: Buffer 0000000244b49800 length 1024 TD size 0 intr 0 type
> >       'Normal' flags b:i:I:c:s:i:e:C
> >
> > 4) kworker/10:3-514   [010] d..1.  2925.619025: xhci_dbc_giveback_request:
> >       bulk-in: req 00000000a70b5ad2 length 0/1024 ==> -6
> >
> > 5) kworker/10:3-514   [010] dNs2.  2925.623820: xhci_dbc_gadget_ep_queue:
> >       BULK: Buffer 0000000244b49800 length 1024 TD size 0 intr 0 type
> >       'Normal' flags b:i:I:c:s:i:e:c
> >
> > 6) kworker/10:3-514   [010] dNs1.  2925.623823: xhci_dbc_queue_request:
> >       bulk-in: req 00000000a70b5ad2 length 0/1024 ==> -115
> >
> > 7) kworker/10:3-514   [010] d..1.  2925.629865: xhci_dbc_handle_event:
> >       EVENT: TRB 000000019588bc80 status 'Short Packet' len 1000 slot 1
> >       ep 3 type 'Transfer Event' flags e:C
> >
> > 8) kworker/10:3-514   [010] d..1.  2925.635540: xhci_dbc_handle_event:
> >       EVENT: TRB 000000019588bc90 status 'Short Packet' len 763 slot 1
> >       ep 3 type 'Transfer Event' flags e:C
> >
> > 9) kworker/10:3-514   [010] d..1.  2925.635540: xhci_dbc_handle_transfer:
> >       BULK: Buffer 0000000244b49c00 length 1024 TD size 0 intr 0 type
> >       'Normal' flags b:i:I:c:s:i:e:C
> >
> > 10) kworker/10:3-514  [010] d..1.  2925.635541: xhci_dbc_giveback_request:
> >       bulk-in: req 00000000b4ec77d7 length 261/1024 ==> 0
> >
> > 11) kworker/10:3-514  [010] dNs2.  2925.635561: xhci_dbc_gadget_ep_queue:
> >       BULK: Buffer 0000000244b49c00 length 1024 TD size 0 intr 0 type
> >       'Normal' flags b:i:I:c:s:i:e:c
> >
> > 12) kworker/10:3-514  [010] dNs1.  2925.635562: xhci_dbc_queue_request:
> >       bulk-in: req 00000000b4ec77d7 length 0/1024 ==> -115
> >
> > Lines 1 and 2 are Endpoints OUT and IN responses to receiving ClearFeature
> > Halt requests.
>
> Stall errors (Line 1 and 2) events are probably issued already when endpoint halts,
> before the host responds with a ClearFeature(Halt) request.
>

I observed these stall errors are the result of calling
libusb_clear_halt() function which results
in sending ClearFeature(Halt) requests to a device. I should have
mentioned it in the commit
message because this is a vital piece of information.

> >
> > Line 7 notifies of reception of 24 bytes packet.
> >
> > Line 8 notifies of reception of 261 bytes packet
> >
> > In Lines [9, 12] 261 bytes packet is being processed.
> >
> > However 24 bytes packet gets dropped. The kernel log includes entry which
> > is an indication of a packet drop:
> > [  127.651845] xhci_hcd 0000:00:0d.0: no matched request
> >
> > This fix adds correct handling of ClearFeature Halt requests
> > by restarting an endpoint which received the request.
> >
> > Fixes: dfba2174dc42 ("usb: xhci: Add DbC support in xHCI driver")
> > Signed-off-by: Łukasz Bartosik <ukaszb@xxxxxxxxxxxx>
> > ---
> > Changes v2->v1:
> > - Documented both xhci_dbc_flush_single_request and
> > xhci_dbc_flush_endpoint_requests functions.
> > ---
> >   drivers/usb/host/xhci-dbgcap.c | 48 +++++++++++++++++++++++++++-------
> >   drivers/usb/host/xhci-dbgtty.c |  5 ++++
> >   2 files changed, 43 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/usb/host/xhci-dbgcap.c b/drivers/usb/host/xhci-dbgcap.c
> > index 872d9cddbcef..36ec3242e096 100644
> > --- a/drivers/usb/host/xhci-dbgcap.c
> > +++ b/drivers/usb/host/xhci-dbgcap.c
> > @@ -173,7 +173,17 @@ static void xhci_dbc_giveback(struct dbc_request *req, int status)
> >       spin_lock(&dbc->lock);
> >   }
> >
> > -static void xhci_dbc_flush_single_request(struct dbc_request *req)
> > +/**
> > + * xhci_dbc_flush_single_request - flushes single request
> > + * @req: request to flush
> > + * @status: 0 or -ERESTART - after the request is flushed it will be queued
> > + *          back to the endpoint
> > + *
> > + *          -ESHUTDOWN - after the request is flushed it won't be queued back
> > + *          to the endpoint and if it was last endpoint's request the endpoint
> > + *          will stay shut
> > + */
>
> Hmm, I need to dig into this.
>
> I don't think we should push this problem up to the request completion handler.
> Maybe we are flushing requests that should not be flushed?
>

Section 7.6.4.3 "Halted DbC Endpoints" in Intel's xHCI specification
says the endpoint can
be halted by HW in case of error. Also it can be halted by software
through HIT or HOT flags for DbC.
I wonder how to recover properly from the Halted state caused by HW
error ? Does it make sense to continue with
the requests or just restart the endpoint (flush all requests) as this
patch does ?

> Do you have an easy way to reproduce the stall error case?
>

Yes I can easily reproduce the case with the stall errors.
Would you like me to run any specific scenarios ?

Thanks,
Lukasz

> Thanks
> -Mathias