Re: Mass Storage Gadget Kthread

Felipe Balbi <balbi@xxxxxx> · Fri, 9 Oct 2015 15:57:47 -0500

Hi,

Peter Chen <peter.chen@xxxxxxxxxxxxx> writes:
> On Fri, Oct 02, 2015 at 11:05:28AM -0500, Felipe Balbi wrote:
>> Hi Alan,
>> 
>> here's a question which I hope you can help me understand :-)
>> 
>> Why do we have that kthread for the mass storage gadgets ? I noticed a very
>> interesting behavior.
>> 
>> Basically, the MSC class works in a loop like so:
>> 
>> CBW
>> Data Transfer
>> CSW
>> 
>> In our implemention, what we do is:
>> 
>> CBW
>> wake_up_process()
>> Data Transfer
>> wake_up_process()
>> CSW
>> wake_up_process()
>> 
>> Now here's the interesting bit. Every time we wake_up_process(), we basically
>> don't do anything until MSC's kthread gets finally scheduled and has a chance of
>> doing its job. This means that the host keeps sending us tokens but the UDC
>> doesn't have any request queued to start a transfer. This happens specially with
>> IN endpoints, not so much on OUT directions. See figure one [1] we can see that
>> host issues over 7 POLLs before UDC has finally started a usb_request, sometimes
>> this goes for even longer (see image [3]).
>> 
>> On figure two we can see that on this particular session, I had as much as 15%
>> of the bandwidth wasted on POLLs. With this current setup I'm 34MB/sec and with
>> the added 15% that would get really close to 40MB/sec.
>> 
>> So the question is, why do we have to wait for that kthread to get scheduled ?
>> Why couldn't we skip it completely ? Is there really anything left in there that
>> couldn't be done from within usb_request->complete() itself ?
>> 
>> I'll spend some time on that today and really dig that thing up, but if you know
>> the answer off the top of your head, I'd be happy to hear.
>> 
>
> To get the best performance, you may try to see as least IN-NAKs
> or NYET-PINGs as possible within uFrame, the software should
> always queue the enough requests, and hardware FIFO should always
> be ready before host sends the token.
>
> From my test (usbtest & g_loopback), with the best situations,
> the chipidea hardware can get 10 transactions for OUT and 11
> transactions for IN within uFrame, and chipidea leaves a little more
> space before end of SoF for last transaction, so the other hardware
> may get 11 transactions for OUT and 12 transactions for IN within
> uFrame (I see it at Intel platform), so from what I see, the best
> performance at Linux for bulk is 44MB for OUT and 48MB for IN.

that's all clear :-) The point is that I _do_ get a ton of PINGs because
the gadget doesn't queue requests fast enough and I'm, currently,
blaming the latency of the kthread() itself, though I haven't been
successful at proving that statement thus far.

-- 
balbi
Attachment:
signature.asc

Description: PGP signature