On 12/16/2014 12:02 PM, Peter Chen wrote: > On Tue, Dec 16, 2014 at 10:50:59AM +0530, Sanchayan Maity wrote: >> On 12/16/2014 06:16 AM, Peter Chen wrote: >>> On Mon, Dec 15, 2014 at 02:59:31PM +0530, Sanchayan Maity wrote: >>>> Hello, >>>> >>>> On 12/15/2014 07:42 AM, Peter Chen wrote: >>>>> On Fri, Dec 12, 2014 at 06:55:36PM +0530, Sanchayan Maity wrote: >>>>>> Hello, >>>>>> >>>>>> On 12/12/2014 07:21 AM, Peter Chen wrote: >>>>>>> On Thu, Dec 11, 2014 at 08:34:45AM -0600, Felipe Balbi wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> On Thu, Dec 11, 2014 at 04:08:43PM +0530, Sanchayan Maity wrote: >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> I am working on a Freescale Cortex-A5 Vybrid Processor. The chip core >>>>>>>>> is clocked at 500MHz and the USB IP core for this is by Chip-idea. I >>>>>>>>> am running a 3.18-rc5 kernel on it and trying to use the USB gadget >>>>>>>>> functionality. To be more specific the CDC ECM class. Currently, I >>>>>>>>> cannot use this properly. If I use just "ping" to check, it works >>>>>>>>> fine, but, after running iperf, even one transaction doesn't complete >>>>>>>>> or completes rarely. Checking the CDC Ether interface with Wireshark >>>>>>>>> shows, TCP Dup Ack messages and checking the USB bus with Wireshark, >>>>>>>>> shows packets with USB Protocol Error -71 at one point and after that >>>>>>>>> packets with USB connection Reset -104 error. If it's of any >>>>>>>>> significance, I have Arch Linux with the 3.18 kernel running on my >>>>>>>>> laptop with which the Vybrid connects. On the host side, the only >>>>>>>>> error dmesg shows is "kevent 12 may have been dropped". I guess this >>>>>>>>> is connected to the "TCP Previous Segment not captured" and "TCP Dup >>>>>>>>> ACK" messages. >>>>>>>>> >>>>>>>>> My script for the gadget configuration is as below: >>>>>>>>> >>>>>>>>> /bin/mount none /mnt -t configfs >>>>>>>>> /bin/mkdir /mnt/usb_gadget/g1 >>>>>>>>> cd /mnt/usb_gadget/g1 >>>>>>>>> /bin/mkdir configs/c.1 >>>>>>>>> /bin/mkdir functions/ecm.0 >>>>>>>>> /bin/mkdir strings/0x409 >>>>>>>>> /bin/mkdir configs/c.1/strings/0x409 >>>>>>>>> echo 0xa4a2 > idProduct >>>>>>>>> echo 0x0525 > idVendor >>>>>>>>> echo Freescale123 > strings/0x409/serialnumber >>>>>>>>> echo Freescale > strings/0x409/manufacturer >>>>>>>>> echo "USB Serial Gadget" > strings/0x409/product >>>>>>>>> echo "Conf 1" > configs/c.1/strings/0x409/configuration >>>>>>>>> echo 200 > configs/c.1/MaxPower >>>>>>>>> ln -s functions/ecm.0 configs/c.1 >>>>>>>>> echo ci_hdrc.0 > UDC >>>>>>>>> /sbin/ifconfig usb0 up >>>>>>>>> /sbin/ifconfig usb0 192.168.1.10 >>>>>>>>> >>>>>>>>> I have debug prints in the udc.c and u_ether.c using pr_debug and >>>>>>>> >>>>>>>> just a little hint, use any of the dev_*() macros next time, they'll >>>>>>>> print the device name which helps figuring out which UDC you're using. >>>>>>>> >>>>>>>> Based on ci_hdrc.0 above, I suppose it's chipidea and Peter Chen >>>>>>>> maintains that one, it really helps adding maintainers to Cc list. >>>>>>>> >>>>>>>>> enable them when required using dynamic debug. Without running iperf, >>>>>>>>> using ping gives me a sequence of prints as below: >>>>>>>>> >>>>>>>>> [ 277.434409] In eth_start_xmit >>>>>>>>> [ 277.434517] In UDC irq >>>>>>>>> [ 277.434553] In usb_gadget_giveback_request >>>>>>>>> [ 277.434567] In tx_complete >>>>>>>>> [ 277.435443] In UDC irq >>>>>>>>> [ 277.435477] In usb_gadget_giveback_request >>>>>>>>> [ 277.435491] In rx_complete >>>>>>>>> [ 277.435517] In rx_submit >>>>>>>>> [ 277.435601] In eth_start_xmit >>>>>>>>> [ 277.436441] In UDC irq >>>>>>>>> [ 277.436465] In usb_gadget_giveback_request >>>>>>>>> [ 277.436478] In rx_complete >>>>>>>>> [ 277.436493] In rx_submit >>>>>>>>> [ 277.436520] In usb_gadget_giveback_request >>>>>>>>> [ 277.436533] In tx_complete >>>>>>>>> [ 278.434865] In eth_start_xmit >>>>>>>>> [ 278.434959] In UDC irq >>>>>>>>> [ 278.434993] In usb_gadget_giveback_request >>>>>>>>> [ 278.435006] In tx_complete >>>>>>>>> [ 278.435881] In UDC irq >>>>>>>>> [ 278.435910] In usb_gadget_giveback_request >>>>>>>>> [ 278.435923] In rx_complete >>>>>>>>> [ 278.435946] In rx_submit >>>>>>>>> >>>>>>>>> After running iperf without debug prints and then enabling before >>>>>>>>> using ping gives me a sequence of prints as below >>>>>>>>> [ 81.989827] In UDC irq >>>>>>>>> [ 81.989871] In usb_gadget_giveback_request >>>>>>>>> [ 81.989886] In rx_complete >>>>>>>>> [ 81.989905] In rx_submit >>>>>>>>> [ 82.989892] In UDC irq >>>>>>>>> [ 82.989951] In usb_gadget_giveback_request >>>>>>>>> [ 82.989967] In rx_complete >>>>>>>>> [ 82.989992] In rx_submit >>>>>>>>> [ 83.990064] In UDC irq >>>>>>>>> [ 83.990126] In usb_gadget_giveback_request >>>>>>>>> [ 83.990142] In rx_complete >>>>>>>>> [ 83.990167] In rx_submit >>>>>>>>> [ 84.990007] In UDC irq >>>>>>>>> [ 84.990049] In usb_gadget_giveback_request >>>>>>>>> [ 84.990064] In rx_complete >>>>>>>>> [ 84.990083] In rx_submit >>>>>>>>> [ 85.990085] In UDC irq >>>>>>>>> [ 85.990147] In usb_gadget_giveback_request >>>>>>>>> [ 85.990163] In rx_complete >>>>>>>>> [ 85.990188] In rx_submit >>>>>>>>> >>>>>>>>> If I force a full speed configuration for this USB client port, I get >>>>>>>>> a slightly more reliable operation where iperf can run for may be half >>>>>>>>> an hour or so or almost an hour before it falls through. Putting in a >>>>>>>>> delay of 100-150 microseconds in eth_start_xmit also improves it like >>>>>>>>> full speed, but, still not reliable. If I run iperf with debug prints >>>>>>>>> enable, this gives similar results to full speed config. After the >>>>>>>>> failure of iperf test, even ping doesn't work. Bringing down this usb0 >>>>>>>>> interface and then up again makes ping work again. I do realize that >>>>>>>>> putting debug prints or delays like this is not the right thing to do, >>>>>>>>> especially in ISR, but, just trying to debug. This is my first time >>>>>>>>> digging in the USB stack. >>>>>>>>> >>>>>>>>> Based on the above, it seems there might a subtle bug or race >>>>>>>>> condition somewhere in the execution call chain which I have not been >>>>>>>>> able to trace yet. Can someone give me some pointers on how I can dig >>>>>>>>> and debug further?. >>>>>>>> >>>>>>> >>>>>>> I just tried latest usb-next with i.mx6 platform, it works ok with >>>>>>> 10 mins iperf bi-direction test. >>>>>> >>>>>> We did think that it is probably an issue seen with Vybrids only. >>>>>> >>>>> >>>>> - Check Vybrid errata to see if any missing in code >>>> >>>> I had not checked the Vybrid errata. There are two erratas and I think one >>>> of them might be relevant to the issue. >>>> >>>> e6857: Adding dTD to Primed Endpoint may not be recognized >>>> >>> >>> The implementation of this errata (In fact, it should not be a errata, >>> it is the software operation required which is applied to all chipidea >>> controller) is already in the code. >>> >>>> It is interesting to see that it seems to be related to what you mention in the >>>> third point. Honestly, not being much knowledgeable on the USB specifications >>>> and protocol, I need to read up on what it exactly implies and I have got hold >>>> of the USB 2.0 spec, but, some search on USB Prime Endpoint revealed on what >>>> might be a similar issue here below. >>>> https://community.freescale.com/thread/336166. >>> >>> The postpone freeing last dtd implementation has already been included in >>> the current code. >> >> Ok. I also had a look much deeper and it is so. >> >>> >>>> >>>>> - Seems your TX has some problems, any trace files using bus analyzer >>>>> can confirm it? >>>> >>>> I would think so. I had already tried checking the USB and network packets with >>>> Wireshark. After a few transactions, the USB packet carries a status of protocol >>>> error with -71 and eventually the connection reset status -104 appears in the >>>> USB packet. The network traces show TCP Dup Ack errors. >>> >>> I mean the trace file captured by hardware usb bus analyzer, but it >>> doesn't matter if you don't have, it seems to relate with dtd list >>> according your test. >> >> Ok. Actually we don't have a hardware USB bus analyzer with us. >>> >>>> >>>>> - Try to run g_ether: modprobe g_ether qmult=1, it will use only >>>>> one request for transfer, to see if it is dtd list problem. >>>> >>>> This is quiet interesting. I just tried recompiling the kernel, as it is much >>>> easier for me at the moment. The qmult value if not specified, takes on the >>>> QMULT_DEFAULT value which is 5. I changed this to 1 and I notice no change with >>>> the iperf tests. On a hunch, I changed DEFAULT_QLEN to 1, instead of 2 which is >>>> for double buffering. Though this does not give me the 115Mbits/sec speed I would >>>> have liked to see had it been working normally, the iperf tests ran reliably for three >>>> hours, which was not the case before with even full speed not working for more than >>>> an hour. This is probably not the solution, and seems to show me a full speed like >>>> behaviour (though this would also be expected I guess without the double buffering) as >>>> the average speeds are 11.6 Mbits/sec near a bit to the 8.67Mbits/sec for full speed >>>> tests. I guess this is related to the dTD list problem as you mention and in the errata. >>>> But, then that Frrescale thread gives the patch link below, which has been taken care of >>>> it seems so I am not sure at all. >>> >>> So it can work reliable with single request for both tx/rx >>> (QMULT_DEFAULT = 1 and DEFAULT_QLEN = 1), right? >> >> Yes, right. I ran iperf for more than 3 hours with QMULT_DEFAULT and DEFAULT_QLEN both >> having the value of 1. The tests ran without a hitch and the connection was reliable so >> I am assuming, this circumvents the issue. >> >> May I ask exactly what is the implication of this result? Am I understanding correctly >> that the above setting restricts the device transfer descriptor list to have only one >> transfer descriptor at any moment for the queue head area to process? Any more and it >> doesn't work. From what I can see based on Richard Stulen's explanation in that thread >> and referring the Vybrid Reference manual to some extent, may be due to Vybrid working >> at a relatively slow clock speed, there might be that time gap between the controller >> writing back the status and the controller rereading the dTD?. >> >>> > > Since you did not meet error for single dtd, it means the Vybrid can > handle one packet data (512 Byte) well, and we disable stream mode for > it, so it is not buffer overrun problem. Even for slow bus, there is > no difference between single dtd and dtd list. Ok. > > How about memory usage for your system? If the memory for the next entry > in the dtd list is invalid, it will cause the problem. top doesn't show any abnormal memory consumption while iperf runs in background. > >>>> >>>> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/patch/drivers/usb/chipidea/core.c?id=2e270412968d961ecde347343ffa67dfe39f6c95 >>> >>> This patch has already been in the kernel you are running. >>> As far as I know, the Vybrid uses the same IP with i.mx6's, I will check >>> it with my colleague. >> >> The Vyrbid's IP is almost the same as i.MX6's. >> >>> >>> The next thing you can do maybe a little hard for you, you need to dump >>> dtd and register like Richard at Freescale community suggested. >>> >> I will look into it and try. I also had a look earlier at Felipe's suggestion >> of setting up tracepoints and trace buffers, but, not in much depth. Will try >> setting up this debug. >> > > On easy way to trace this problem, do you have any tools to dump > physical address, eg, /unit_tests/memtool, you can dump register and > dQH and dtd status when the error occurs. > > dQH's address is stored at $BASE+0x158, and the space from it is all > dQH, each dQH occupies 64 Bytes, for dQH/dtd structure, you can refer > RM. > I have devmem2 with which I can dump the physical memory addresses in question. So, I will try debugging with that for a start and if that is not clear enough then set up a trace. -Regards, Sanchayan. -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html