Re: Mass Storage Gadget Kthread

Peter Chen <peter.chen@xxxxxxxxxxxxx> · Fri, 9 Oct 2015 12:51:49 +0800

On Fri, Oct 02, 2015 at 11:05:28AM -0500, Felipe Balbi wrote:
> Hi Alan,
> 
> here's a question which I hope you can help me understand :-)
> 
> Why do we have that kthread for the mass storage gadgets ? I noticed a very
> interesting behavior.
> 
> Basically, the MSC class works in a loop like so:
> 
> CBW
> Data Transfer
> CSW
> 
> In our implemention, what we do is:
> 
> CBW
> wake_up_process()
> Data Transfer
> wake_up_process()
> CSW
> wake_up_process()
> 
> Now here's the interesting bit. Every time we wake_up_process(), we basically
> don't do anything until MSC's kthread gets finally scheduled and has a chance of
> doing its job. This means that the host keeps sending us tokens but the UDC
> doesn't have any request queued to start a transfer. This happens specially with
> IN endpoints, not so much on OUT directions. See figure one [1] we can see that
> host issues over 7 POLLs before UDC has finally started a usb_request, sometimes
> this goes for even longer (see image [3]).
> 
> On figure two we can see that on this particular session, I had as much as 15%
> of the bandwidth wasted on POLLs. With this current setup I'm 34MB/sec and with
> the added 15% that would get really close to 40MB/sec.
> 
> So the question is, why do we have to wait for that kthread to get scheduled ?
> Why couldn't we skip it completely ? Is there really anything left in there that
> couldn't be done from within usb_request->complete() itself ?
> 
> I'll spend some time on that today and really dig that thing up, but if you know
> the answer off the top of your head, I'd be happy to hear.
> 

To get the best performance, you may try to see as least IN-NAKs
or NYET-PINGs as possible within uFrame, the software should
always queue the enough requests, and hardware FIFO should always
be ready before host sends the token.

>From my test (usbtest & g_loopback), with the best situations,
the chipidea hardware can get 10 transactions for OUT and 11
transactions for IN within uFrame, and chipidea leaves a little more
space before end of SoF for last transaction, so the other hardware
may get 11 transactions for OUT and 12 transactions for IN within
uFrame (I see it at Intel platform), so from what I see, the best
performance at Linux for bulk is 44MB for OUT and 48MB for IN.

-- 

Best Regards,
Peter Chen
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html