Re: [PATCH 2/2] dwc3: gadget: fix tracking of used sgs in request

Felipe Balbi <balbi@xxxxxxxxxx> · Fri, 23 Apr 2021 14:15:33 +0300

Hi,

Michael Grzeschik <mgr@xxxxxxxxxxxxxx> writes:
>>>Michael Grzeschik <m.grzeschik@xxxxxxxxxxxxxx> writes:
>>>>> The variable pending_sgs was used to keep track of handled
>>>>> sgs in one request. But instead queued_sgs is being decremented
>>>>
>>>>no, it wasn't. If the total number of entries in the scatter list is 'x'
>>>>and we have transferred (completed) 'y' entries, then pending_sgs should
>>>>be (x - y).
>>>>
>>>>> on every handled sg. This patch fixes the usage of the variable
>>>>> to use queued_sgs instead as intended.
>>>>>
>>>>> Signed-off-by: Michael Grzeschik <m.grzeschik@xxxxxxxxxxxxxx>
>>>>> ---
>>>>>  drivers/usb/dwc3/gadget.c | 8 ++++----
>>>>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>>> index 118b5bcc565d6..2d7d861b13b31 100644
>>>>> --- a/drivers/usb/dwc3/gadget.c
>>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>>> @@ -2856,7 +2856,7 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,
>>>>>  	struct dwc3_trb *trb = &dep->trb_pool[dep->trb_dequeue];
>>>>>  	struct scatterlist *sg = req->sg;
>>>>>  	struct scatterlist *s;
>>>>> -	unsigned int pending = req->num_pending_sgs;
>>>>> +	unsigned int pending = req->num_queued_sgs;
>>>>>  	unsigned int i;
>>>>>  	int ret = 0;
>>>>>
>>>>> @@ -2864,7 +2864,7 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,
>>>>>  		trb = &dep->trb_pool[dep->trb_dequeue];
>>>>>
>>>>>  		req->sg = sg_next(s);
>>>>> -		req->num_pending_sgs--;
>>>>> +		req->num_queued_sgs--;
>>>>
>>>>no, this is wrong. queued shouldn't be modified as it comes straight
>>>>from the gadget driver. This is the number of entries in the request
>>>>that the gadget driver gave us. We don't want to modify it.
>>>
>>> Right, but pending_sgs than has two use cases. One to track the mapped
>>> sgs that got not queued. And one here to to track the "queued sgs" that
>>> got dequeued.
>>
>>no, we have num_mapped_sgs for the number of mapped sgs. It's just that
>>right before kicking the transfer the number of pending sgs and the
>>number of mapped sgs is the same.
>>
>>>>>
>>>>>  		ret = dwc3_gadget_ep_reclaim_completed_trb(dep, req,
>>>>>  				trb, event, status, true);
>>>>> @@ -2887,7 +2887,7 @@ static int dwc3_gadget_ep_reclaim_trb_linear(struct dwc3_ep *dep,
>>>>>
>>>>>  static bool dwc3_gadget_ep_request_completed(struct dwc3_request *req)
>>>>>  {
>>>>> -	return req->num_pending_sgs == 0;
>>>>> +	return req->num_queued_sgs == 0;
>>>>
>>>>nope, request is, indeed, completed when there no more pending entries
>>>>to be consumed.
>>>>
>>>>What sort of problem are you dealing with? Got any way of reproducing
>>>>it? Got some trace output showing the issue?
>>>
>>> I digged a little bit deeper to fully understand the issue here. What
>>> I found is that in dwc3_prepare_trbs will make two assumptions on
>>> num_pending_sgs.
>>>
>>> When the real purpose of the variable is to track the dequeued trbs.
>>
>>its purpose is *not* to track the dequeued trbs.
>>
>>> Than we have to fix the started list handling in the dwc3_prepare_trbs.
>>>
>>> The comment in the function says:
>>>
>>>          /*
>>>           * We can get in a situation where there's a request in the started list
>>>           * but there weren't enough TRBs to fully kick it in the first time
>>>           * around, so it has been waiting for more TRBs to be freed up.
>>>           *
>>>           * In that case, we should check if we have a request with pending_sgs
>>>           * in the started list and prepare TRBs for that request first,
>>>           * otherwise we will prepare TRBs completely out of order and that will
>>>           * break things.
>>>           */
>>>          list_for_each_entry(req, &dep->started_list, list) {
>>> 		if (req->num_pending_sgs > 0) {
>>> 		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>
>>> This condition seems to be made on a wrong assumption, thinking the
>>> num_pending_sgs was decremented after dwc3_prepare_one_trb was called on parts
>>> of that requests sgs but not all.
>>
>>say num_mapped_sgs = 500. The driver has 255 TRBs for use (+1 link
>>trb). Clearly not all SGs will fit in one go. Driver will set
>>num_pending_sgs to 500, because that's the number of sgs that need to be
>>transferred.
>>
>>For each completion we will decrement num_pending_sgs. Assuming all 255
>>were free and used up, num_pending_sgs will be 245 at this point. Driver
>>*must* try to kick these 245 TRBs.
>
> Lets assume we have a request with length 1024 to 3072 like a multiple
> of maxpacket * mul it is used on in uvc_video.
>
> So with a scatterlist creating only a 2 byte header and 1022 to 3070
> bytes payload we get only two scatter entries in this request. On
> special cases like page boundaries we will add one extra sg, but this
> does not change much. So this is nothing big as your 500 example. Now
> lets assume we queue 64 of these up we end up with maximum fo 192 trbs
> being added to the pending list.
>
> Lets say every one of these will be queued with ep_queue comming from
> uvc_video_pump. The kick_transfer from these ep_queues will result in an
> dwc3_prepare_trbs. *But* after the first one being preprared from the
> pending list to the started list, it will have an req->num_preprared_sgs != 0.

there's no such thing as req->num_prepared_sgs, do you mean
num_pending_sgs here? If you do, that's per-request. Gadget driver can
allocate and queue as many requests as it
likes. req_{n}->num_pending_sgs has nothing to do with
req_{n+1}->num_pending_sgs.

> Therefor after the first one got queued into the starting list and it
> did not complete yet, every next ep_queue->kick_transfer will always bail out
> in dwc3_prepare_sgs, checking the one started request with num_preprared
> != 0, without moving other requests from pending to the starting list.

no it won't. They're testing a different req pointer altogether. Or they
should. If you're queueing the same exact req pointer before it has been
completed, shame on you :-p

> This will result in an stalled driver, that will never prepare any
> pending requests into the started list.

it shouldn't. Please collect tracepoint output, this is probably the 5th
time I've asked for tracepoint output of the problem.

>>> But the completion path can not also depend on that variable to be decremented
>>> after parts of that sgs get handled. Therefor I came up with this second patch.
>>>
>>> What do you think, would be the right way to solve this?
>>
>>unless you can show that a problem really exists, I don't think we
>>should do anything. Got a minimal reproduction method somewhere?
>>Tracepoint of the problem?
>
> Look into the attached 20210423_uvc_video_dwc3_gadget_trace. You will
> see my scenario I want to fix.

where exactly in the file is the problem? I took a snippet to look at:

>  irq/51-dwc3-239     [000] d..1    56.766818: dwc3_event: event (d08510ca): ep2in: Transfer Not Ready [0000d085] (Not Active)
> kworker/u5:1-74      [000] d..2    56.807742: dwc3_ep_queue: ep2in: req 00000000823b0b62 length 0/3072 zsI ==> -115
> kworker/u5:1-74      [000] d..2    56.807759: dwc3_prepare_trb: ep2in: trb 000000007ebd2707 (E1:D0) buf 000000000c623058 size 1x 2 ctrl 00000465:00000000 (HlCs:Sc:isoc-first)
> kworker/u5:1-74      [000] d..2    56.807764: dwc3_prepare_trb: ep2in: trb 0000000093bb2bc4 (E2:D0) buf 000000000e200000 size 1x 3070 ctrl 00000c71:00000000 (Hlcs:SC:isoc)
> kworker/u5:1-74      [000] d..2    56.807777: dwc3_gadget_ep_cmd: ep2in: cmd 'Start Transfer' [d1d00406] params 00000000 0a6ab000 00000000 --> status: Successful

So, Isoc endpoints are a bit "special". We always wait for XferNotReady
before starting them. This is what you see here.

> kworker/u5:1-74      [000] d..2    56.807805: dwc3_ep_queue: ep2in: req 000000006d3f8d0a length 0/3072 zsi ==> -115

When the following request is queued, TRANSFER_STARTED is set, but
PENDING_REQUEST shouldn't be set which would mean
__dwc3_gadget_kick_transfer() be called which should result in a call to
Update Transfer, unless I'm missing something here. It has been quite a
while since I looked at this driver deeply.

> kworker/u5:1-74      [000] d..2    56.807831: dwc3_ep_queue: ep2in: req 00000000f5394c5d length 0/3072 zsi ==> -115
> kworker/u5:1-74      [000] d..2    56.807857: dwc3_ep_queue: ep2in: req 00000000ef95b55c length 0/3072 zsi ==> -115
> kworker/u5:1-74      [000] d..2    56.807883: dwc3_ep_queue: ep2in: req 00000000d36cf0cc length 0/3072 zsi ==> -115
> kworker/u5:1-74      [000] d..2    56.807908: dwc3_ep_queue: ep2in: req 00000000c5f1bd4b length 0/3072 zsi ==> -115
> kworker/u5:1-74      [000] d..2    56.807934: dwc3_ep_queue: ep2in: req 00000000b7180b2d length 0/3072 zsi ==> -115
> kworker/u5:1-74      [000] d..2    56.807960: dwc3_ep_queue: ep2in: req 000000005d629ded length 0/3072 zsi ==> -115
> kworker/u5:1-74      [000] d..2    56.807985: dwc3_ep_queue: ep2in: req 0000000062e02cc0 length 0/3072 zsi ==> -115
> kworker/u5:1-74      [000] d..2    56.808011: dwc3_ep_queue: ep2in: req 000000001d2ec1c0 length 0/3072 zsi ==> -115
> kworker/u5:1-74      [000] d..2    56.808038: dwc3_ep_queue: ep2in: req 0000000018ba3d29 length 0/3072 zsi ==> -115
> kworker/u5:1-74      [000] d..2    56.808063: dwc3_ep_queue: ep2in: req 000000000be47fc7 length 0/3072 zsi ==> -115
> kworker/u5:1-74      [000] d..2    56.808089: dwc3_ep_queue: ep2in: req 00000000caba09e2 length 0/3072 zsi ==> -115
> kworker/u5:1-74      [000] d..2    56.808115: dwc3_ep_queue: ep2in: req 00000000cb0191c4 length 0/3072 zsi ==> -115
> kworker/u5:1-74      [000] d..2    56.808140: dwc3_ep_queue: ep2in: req 000000005255f606 length 0/3072 zsi ==> -115
>  irq/51-dwc3-239     [000] d..1    56.808240: dwc3_event: event (d1d0408a): ep2in: Transfer In Progress [0000d1d0] (sIm)

eventually we get Transfer In Progress for the first TRB...

>  irq/51-dwc3-239     [000] d..1    56.808246: dwc3_complete_trb: ep2in: trb 000000007ebd2707 (E2:D1) buf 000000000c623058 size 1x 0 ctrl 34740464:0000d1d0 (hlCs:Sc:isoc-first)
>  irq/51-dwc3-239     [000] d..1    56.808250: dwc3_complete_trb: ep2in: trb 0000000093bb2bc4 (E2:D2) buf 000000000e200000 size 1x 0 ctrl 34740c70:0000d1d0 (hlcs:SC:isoc)
>  irq/51-dwc3-239     [000] d..1    56.808256: dwc3_gadget_giveback: ep2in: req 00000000823b0b62 length 3072/3072 zsI ==> 0

complete and give it back.

>  irq/51-dwc3-239     [000] d..1    56.808305: dwc3_prepare_trb: ep2in: trb 000000004af35733 (E3:D2) buf 000000000c623088 size 1x 2 ctrl 00000465:00000000 (HlCs:Sc:isoc-first)
>  irq/51-dwc3-239     [000] d..1    56.808309: dwc3_prepare_trb: ep2in: trb 00000000a11671a8 (E4:D2) buf 000000000e200bfe size 1x 3070 ctrl 00000471:00000000 (Hlcs:Sc:isoc)

prepare the next

>  irq/51-dwc3-239     [000] d..1    56.808320: dwc3_gadget_ep_cmd: ep2in: cmd 'Update Transfer' [50407] params 00000000 00000000 00000000 --> status: Successful

kick it by telling the HW about it. I don't see any failures here,
although there may be situations where we could add more data to the
controller's cache early on.

I remember dealing with a bunch of Bus Expiry errors in the past and the
problem was because we couldn't maintain proper alignment with the
microframe number because HW only gives us 14 out of the 16 bits of the
microframe number, the only workaround was to waiting for
XferInProgress/XferNotReady to get that number.

PS: if you collect more traces in the future, please don't enable
ftrace. Keep current_tracer set to nop, ftrace is just adding garbage to
the output in this case :-)

-- 
balbi
Attachment:
signature.asc

Description: PGP signature