On Tue, Oct 04, 2016 at 03:23:12PM +0300, Felipe Balbi wrote: > > Hi, > > Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> writes: > > On Mon, Oct 03, 2016 at 07:08:43PM +0300, Felipe Balbi wrote: > >> > >> Hi, > >> > >> ville.syrjala@xxxxxxxxxxxxxxx writes: > >> > From: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> > >> > > >> > This reverts commit 55a0237f8f47957163125e20ee9260538c5c341c. > >> > > >> > commit 55a0237f8f47 ("usb: dwc3: gadget: use allocated/queued reqs for > >> > LST bit") causes my BYT FFRD8 with g_ether to behave poorly. ssh/scp > >> > is very sluggish and can even stall entirely. Revert cures it. > >> > > >> > Cc: Felipe Balbi <felipe.balbi@xxxxxxxxxxxxxxx> > >> > Cc: stable@xxxxxxxxxxxxxxx > >> > Signed-off-by: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> > >> > >> Reverting that commit is not the best idea. Let's start with the usual: > >> > >> - Kernel version > >> - dmesg on both sides (host and device) > >> - dwc3 tracepoints: > >> > >> # mkdir -p /t > >> # mount -t tracefs none /t > >> # echo 8192 > /t/buffer_size_kb > >> # echo 1 > /t/events/dwc3/enable > >> # echo 0 > /t/events/dwc3/dwc3_readl/enable > >> # echo 0 > /t/events/dwc3/dwc3_writel/enable > >> > >> This should be enough to tell me what's really going on. > > > > I've attached the (compressed due to size) dmesgs/traces from the device. > > For my test I just did 'for i in `seq 1 5` ; dmesg ; done' in an ssh > > session, and I did get the lag in the bad run, so presumably the trace > > should have caught it. Both were on 4.8 (+ the revert for the good run). > > > > Host side dmesg just showes me: > > usb 3-3.4: new high-speed USB device number 25 using xhci_hcd > > cdc_ether 3-3.4:1.0 usb0: register 'cdc_ether' at usb-0000:00:14.0-3.4, CDC Ethernet Device, ... > > when I boot the device. Nothing extra appears when things lag around. > > I'm running 4.5.something on the host. > > Looking at your logs what I can see is that "revert.trace" doesn't > respect gadget driver's IRQ throttling, which means you get an interrupt > for each and every request, while "bad.trace" shows dwc3 respecting > "no_interrupt" flag passed in by g_ether. > > The only explanation I have for this behavior when we start respecting > gadget driver's flags, is that there's a possible race in u_ether.c > where it can go for a long time without an interrupt due to that > throttling. > > This only happens with IN endpoints (the ones which are throttled) and > the behavior can be seen in log snippet below: > > 27782: irq/17-dwc3-2524 [000] d... 666.891969: tx_complete: req ffff88016da1a6c0 length 1514/1514 flags ZsI tx_qlen 4 qmult 5 > 27802: ksoftirqd/0-3 [000] .Ns1 666.892047: eth_start_xmit: req ffff88016da1a6c0 length 0/1514 flags Zsi tx_qlen 5 qmult 5 > 27808: ksoftirqd/0-3 [000] .Ns2 666.892062: eth_start_xmit: req ffff88016da1a300 length 0/1514 flags ZsI tx_qlen 6 qmult 5 > 27814: ksoftirqd/0-3 [000] .Ns2 666.892069: eth_start_xmit: req ffff88016da1ab40 length 0/1514 flags Zsi tx_qlen 7 qmult 5 > 27820: ksoftirqd/0-3 [000] .Ns2 666.892075: eth_start_xmit: req ffff88016da1acc0 length 0/1514 flags Zsi tx_qlen 8 qmult 5 > 27826: ksoftirqd/0-3 [000] .Ns2 666.892082: eth_start_xmit: req ffff88016da1ae40 length 0/1514 flags Zsi tx_qlen 9 qmult 5 > 27832: ksoftirqd/0-3 [000] .Ns2 666.892088: eth_start_xmit: req ffff88016da1a780 length 0/1514 flags Zsi tx_qlen 10 qmult 5 > 27884: irq/17-dwc3-2524 [000] d... 666.892213: tx_complete: req ffff88016da1a840 length 1514/1514 flags Zsi tx_qlen 9 qmult 5 > 27889: irq/17-dwc3-2524 [000] d... 666.892215: tx_complete: req ffff88016da1a600 length 1514/1514 flags Zsi tx_qlen 8 qmult 5 > 27897: sshd-2684 [002] ..s1 666.892223: eth_start_xmit: req ffff88016da1a840 length 0/1514 flags Zsi tx_qlen 9 qmult 5 > 27898: irq/17-dwc3-2524 [000] d... 666.892223: tx_complete: req ffff88016da1a180 length 1514/1514 flags Zsi tx_qlen 8 qmult 5 > 27903: irq/17-dwc3-2524 [000] d... 666.892225: tx_complete: req ffff88016da1a240 length 1514/1514 flags Zsi tx_qlen 7 qmult 5 > 27910: sshd-2684 [002] ..s1 666.892230: eth_start_xmit: req ffff88016da1a180 length 0/1514 flags Zsi tx_qlen 8 qmult 5 > 27913: irq/17-dwc3-2524 [000] d... 666.892231: tx_complete: req ffff88016da1a6c0 length 1514/1514 flags Zsi tx_qlen 7 qmult 5 > 27920: sshd-2684 [002] ..s1 666.892236: eth_start_xmit: req ffff88016da1a6c0 length 0/1514 flags Zsi tx_qlen 8 qmult 5 > 27923: irq/17-dwc3-2524 [000] d... 666.892238: tx_complete: req ffff88016da1a300 length 1514/1514 flags ZsI tx_qlen 7 qmult 5 > 27928: sshd-2684 [002] ..s1 666.892242: eth_start_xmit: req ffff88016da1a240 length 0/1514 flags Zsi tx_qlen 8 qmult 5 > 27944: sshd-2684 [002] ..s1 666.892253: eth_start_xmit: req ffff88016da1a300 length 0/1514 flags Zsi tx_qlen 9 qmult 5 > 27953: sshd-2684 [002] ..s1 666.892262: eth_start_xmit: req ffff88016da1a600 length 0/1514 flags Zsi tx_qlen 10 qmult 5 > > Note here that we reached tx_qlen = 10 without any interrupts being requested. > > I patched u_ether with below to capture log above: > > diff --git a/drivers/usb/gadget/function/u_ether.c b/drivers/usb/gadget/function/u_ether.c > index 5f562c1ec795..e34c26ffcdf4 100644 > --- a/drivers/usb/gadget/function/u_ether.c > +++ b/drivers/usb/gadget/function/u_ether.c > @@ -465,6 +465,15 @@ static void tx_complete(struct usb_ep *ep, struct usb_request *req) > dev_kfree_skb_any(skb); > > atomic_dec(&dev->tx_qlen); > + > + trace_printk("req %p length %d/%d flags %c%c%c tx_qlen %d qmult %d\n", > + req, req->actual, req->length, > + req->zero ? 'Z' : 'z', > + req->short_not_ok ? 'S' : 's', > + req->no_interrupt ? 'i' : 'I', > + atomic_read(&dev->tx_qlen), > + dev->qmult); > + > if (netif_carrier_ok(dev->net)) > netif_wake_queue(dev->net); > } > @@ -600,6 +609,12 @@ static netdev_tx_t eth_start_xmit(struct sk_buff *skb, > case 0: > netif_trans_update(net); > atomic_inc(&dev->tx_qlen); > + trace_printk("req %p length %d/%d flags %c%c%c tx_qlen %d qmult %d\n", > + req, req->actual, req->length, req->zero ? 'Z' : 'z', > + req->short_not_ok ? 'S' : 's', > + req->no_interrupt ? 'i' : 'I', > + atomic_read(&dev->tx_qlen), > + dev->qmult); > } > > if (retval) { > > The only way I can think of to avoid this problem is to make sure we > don't starve our request queue with something like: > > diff --git a/drivers/usb/gadget/function/u_ether.c b/drivers/usb/gadget/function/u_ether.c > index 5f562c1ec795..ddfa866c2cba 100644 > --- a/drivers/usb/gadget/function/u_ether.c > +++ b/drivers/usb/gadget/function/u_ether.c > @@ -587,8 +596,9 @@ static netdev_tx_t eth_start_xmit(struct sk_buff *skb, > > /* throttle high/super speed IRQ rate back slightly */ > if (gadget_is_dualspeed(dev->gadget)) > - req->no_interrupt = (dev->gadget->speed == USB_SPEED_HIGH || > - dev->gadget->speed == USB_SPEED_SUPER) > + req->no_interrupt = (((dev->gadget->speed == USB_SPEED_HIGH || > + dev->gadget->speed == USB_SPEED_SUPER)) && > + !list_empty(&dev->tx_reqs)) > ? ((atomic_read(&dev->tx_qlen) % dev->qmult) != 0) > : 0; > > I have been testing this for about half an hour and I can't see delays > or link getting stuck. Care to test ? If it works for your setup, I'll > send this with Cc: stable and your Reported-by and Tested-by tags. That doesn't appear to cure the problem unfortunately. I've attached another trace, with your u_ether trace_printks included. -- Ville Syrjälä Intel OTC
Attachment:
u_ether.trace.gz
Description: Binary data