On 12/18/2014 11:39 AM, Peter Chen wrote: > On Thu, Dec 18, 2014 at 12:25:41PM +0530, Sanchayan Maity wrote: >> On 12/18/2014 10:24 AM, Peter Chen wrote: >>> On Wed, Dec 17, 2014 at 10:04:35AM +0530, Sanchayan Maity wrote: >>>> On 12/17/2014 05:46 AM, Peter Chen wrote: >>>>> On Tue, Dec 16, 2014 at 04:15:08PM +0530, Sanchayan Maity wrote: >>>>>> On 12/16/2014 02:15 PM, Peter Chen wrote: >>>>>>> On Tue, Dec 16, 2014 at 10:50:59AM +0530, Sanchayan Maity wrote: >>>>>>>> On 12/16/2014 06:16 AM, Peter Chen wrote: >>>>>>>>> On Mon, Dec 15, 2014 at 02:59:31PM +0530, Sanchayan Maity wrote: >>>>>>>>>> Hello, >>>>>>>>>> >>>>>>>>>> On 12/15/2014 07:42 AM, Peter Chen wrote: >>>>>>>>>>> On Fri, Dec 12, 2014 at 06:55:36PM +0530, Sanchayan Maity wrote: >>>>>>>>>>>> Hello, >>>>>>>>>>>> >>>>>>>>>>>> On 12/12/2014 07:21 AM, Peter Chen wrote: >>>>>>>>>>>>> On Thu, Dec 11, 2014 at 08:34:45AM -0600, Felipe Balbi wrote: >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Dec 11, 2014 at 04:08:43PM +0530, Sanchayan Maity wrote: >>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I am working on a Freescale Cortex-A5 Vybrid Processor. The chip core >>>>>>>>>>>>>>> is clocked at 500MHz and the USB IP core for this is by Chip-idea. I >>>>>>>>>>>>>>> am running a 3.18-rc5 kernel on it and trying to use the USB gadget >>>>>>>>>>>>>>> functionality. To be more specific the CDC ECM class. Currently, I >>>>>>>>>>>>>>> cannot use this properly. If I use just "ping" to check, it works >>>>>>>>>>>>>>> fine, but, after running iperf, even one transaction doesn't complete >>>>>>>>>>>>>>> or completes rarely. Checking the CDC Ether interface with Wireshark >>>>>>>>>>>>>>> shows, TCP Dup Ack messages and checking the USB bus with Wireshark, >>>>>>>>>>>>>>> shows packets with USB Protocol Error -71 at one point and after that >>>>>>>>>>>>>>> packets with USB connection Reset -104 error. If it's of any >>>>>>>>>>>>>>> significance, I have Arch Linux with the 3.18 kernel running on my >>>>>>>>>>>>>>> laptop with which the Vybrid connects. On the host side, the only >>>>>>>>>>>>>>> error dmesg shows is "kevent 12 may have been dropped". I guess this >>>>>>>>>>>>>>> is connected to the "TCP Previous Segment not captured" and "TCP Dup >>>>>>>>>>>>>>> ACK" messages. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> My script for the gadget configuration is as below: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> /bin/mount none /mnt -t configfs >>>>>>>>>>>>>>> /bin/mkdir /mnt/usb_gadget/g1 >>>>>>>>>>>>>>> cd /mnt/usb_gadget/g1 >>>>>>>>>>>>>>> /bin/mkdir configs/c.1 >>>>>>>>>>>>>>> /bin/mkdir functions/ecm.0 >>>>>>>>>>>>>>> /bin/mkdir strings/0x409 >>>>>>>>>>>>>>> /bin/mkdir configs/c.1/strings/0x409 >>>>>>>>>>>>>>> echo 0xa4a2 > idProduct >>>>>>>>>>>>>>> echo 0x0525 > idVendor >>>>>>>>>>>>>>> echo Freescale123 > strings/0x409/serialnumber >>>>>>>>>>>>>>> echo Freescale > strings/0x409/manufacturer >>>>>>>>>>>>>>> echo "USB Serial Gadget" > strings/0x409/product >>>>>>>>>>>>>>> echo "Conf 1" > configs/c.1/strings/0x409/configuration >>>>>>>>>>>>>>> echo 200 > configs/c.1/MaxPower >>>>>>>>>>>>>>> ln -s functions/ecm.0 configs/c.1 >>>>>>>>>>>>>>> echo ci_hdrc.0 > UDC >>>>>>>>>>>>>>> /sbin/ifconfig usb0 up >>>>>>>>>>>>>>> /sbin/ifconfig usb0 192.168.1.10 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I have debug prints in the udc.c and u_ether.c using pr_debug and >>>>>>>>>>>>>> >>>>>>>>>>>>>> just a little hint, use any of the dev_*() macros next time, they'll >>>>>>>>>>>>>> print the device name which helps figuring out which UDC you're using. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Based on ci_hdrc.0 above, I suppose it's chipidea and Peter Chen >>>>>>>>>>>>>> maintains that one, it really helps adding maintainers to Cc list. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> enable them when required using dynamic debug. Without running iperf, >>>>>>>>>>>>>>> using ping gives me a sequence of prints as below: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [ 277.434409] In eth_start_xmit >>>>>>>>>>>>>>> [ 277.434517] In UDC irq >>>>>>>>>>>>>>> [ 277.434553] In usb_gadget_giveback_request >>>>>>>>>>>>>>> [ 277.434567] In tx_complete >>>>>>>>>>>>>>> [ 277.435443] In UDC irq >>>>>>>>>>>>>>> [ 277.435477] In usb_gadget_giveback_request >>>>>>>>>>>>>>> [ 277.435491] In rx_complete >>>>>>>>>>>>>>> [ 277.435517] In rx_submit >>>>>>>>>>>>>>> [ 277.435601] In eth_start_xmit >>>>>>>>>>>>>>> [ 277.436441] In UDC irq >>>>>>>>>>>>>>> [ 277.436465] In usb_gadget_giveback_request >>>>>>>>>>>>>>> [ 277.436478] In rx_complete >>>>>>>>>>>>>>> [ 277.436493] In rx_submit >>>>>>>>>>>>>>> [ 277.436520] In usb_gadget_giveback_request >>>>>>>>>>>>>>> [ 277.436533] In tx_complete >>>>>>>>>>>>>>> [ 278.434865] In eth_start_xmit >>>>>>>>>>>>>>> [ 278.434959] In UDC irq >>>>>>>>>>>>>>> [ 278.434993] In usb_gadget_giveback_request >>>>>>>>>>>>>>> [ 278.435006] In tx_complete >>>>>>>>>>>>>>> [ 278.435881] In UDC irq >>>>>>>>>>>>>>> [ 278.435910] In usb_gadget_giveback_request >>>>>>>>>>>>>>> [ 278.435923] In rx_complete >>>>>>>>>>>>>>> [ 278.435946] In rx_submit >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> After running iperf without debug prints and then enabling before >>>>>>>>>>>>>>> using ping gives me a sequence of prints as below >>>>>>>>>>>>>>> [ 81.989827] In UDC irq >>>>>>>>>>>>>>> [ 81.989871] In usb_gadget_giveback_request >>>>>>>>>>>>>>> [ 81.989886] In rx_complete >>>>>>>>>>>>>>> [ 81.989905] In rx_submit >>>>>>>>>>>>>>> [ 82.989892] In UDC irq >>>>>>>>>>>>>>> [ 82.989951] In usb_gadget_giveback_request >>>>>>>>>>>>>>> [ 82.989967] In rx_complete >>>>>>>>>>>>>>> [ 82.989992] In rx_submit >>>>>>>>>>>>>>> [ 83.990064] In UDC irq >>>>>>>>>>>>>>> [ 83.990126] In usb_gadget_giveback_request >>>>>>>>>>>>>>> [ 83.990142] In rx_complete >>>>>>>>>>>>>>> [ 83.990167] In rx_submit >>>>>>>>>>>>>>> [ 84.990007] In UDC irq >>>>>>>>>>>>>>> [ 84.990049] In usb_gadget_giveback_request >>>>>>>>>>>>>>> [ 84.990064] In rx_complete >>>>>>>>>>>>>>> [ 84.990083] In rx_submit >>>>>>>>>>>>>>> [ 85.990085] In UDC irq >>>>>>>>>>>>>>> [ 85.990147] In usb_gadget_giveback_request >>>>>>>>>>>>>>> [ 85.990163] In rx_complete >>>>>>>>>>>>>>> [ 85.990188] In rx_submit >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> If I force a full speed configuration for this USB client port, I get >>>>>>>>>>>>>>> a slightly more reliable operation where iperf can run for may be half >>>>>>>>>>>>>>> an hour or so or almost an hour before it falls through. Putting in a >>>>>>>>>>>>>>> delay of 100-150 microseconds in eth_start_xmit also improves it like >>>>>>>>>>>>>>> full speed, but, still not reliable. If I run iperf with debug prints >>>>>>>>>>>>>>> enable, this gives similar results to full speed config. After the >>>>>>>>>>>>>>> failure of iperf test, even ping doesn't work. Bringing down this usb0 >>>>>>>>>>>>>>> interface and then up again makes ping work again. I do realize that >>>>>>>>>>>>>>> putting debug prints or delays like this is not the right thing to do, >>>>>>>>>>>>>>> especially in ISR, but, just trying to debug. This is my first time >>>>>>>>>>>>>>> digging in the USB stack. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Based on the above, it seems there might a subtle bug or race >>>>>>>>>>>>>>> condition somewhere in the execution call chain which I have not been >>>>>>>>>>>>>>> able to trace yet. Can someone give me some pointers on how I can dig >>>>>>>>>>>>>>> and debug further?. >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I just tried latest usb-next with i.mx6 platform, it works ok with >>>>>>>>>>>>> 10 mins iperf bi-direction test. >>>>>>>>>>>> >>>>>>>>>>>> We did think that it is probably an issue seen with Vybrids only. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> - Check Vybrid errata to see if any missing in code >>>>>>>>>> >>>>>>>>>> I had not checked the Vybrid errata. There are two erratas and I think one >>>>>>>>>> of them might be relevant to the issue. >>>>>>>>>> >>>>>>>>>> e6857: Adding dTD to Primed Endpoint may not be recognized >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> Sorry, I made a mistake, it is a new errata, and does not be included in >>>>>>> the code. All imx project uses 2.0a or 2.50a which does not need this >>>>>>> errata, and Vybrid uses 2.40a core which needs this errata, I will do a >>>>>>> patch for this soon, but before that, would you read your ID register >>>>>>> ($BASE + 0x0) for me? I would like to confirm if your REVISION value >>>>>>> is 0100b. >>>>>>> >>>>>> >>>>>> As per the reference manual and also the devmem2 readout of the USB ID register >>>>>> the value is 0xE481FA05. This gives a Revision number 0x81 for the controller >>>>>> core and 0x05 for the ID viz. configuration number. >>>>>> >>>>> >>>>> Thanks, Revision Number is bit 21 - bit 24, it is 0100b if your ID value >>>>> is 0xE481FA05. >>>>> >>>> >>>> I was wondering how you got the numbers exactly. The reference manual shows >>>> bit 16 - bit 23 as the revision of the controller core, bit 0 - bit 5 as the >>>> ID and bit 24 - bit 31 are reserved. Just for my own information is there a >>>> different interpretation as well? >>>> >>>> Just to be clear so this confirms that the Vybrid has 2.40 core which needs the >>>> errata and which is presently not implemented in software, which results in the >>>> issue we are seeing? Otherwise I will just carry on my testing and also try and >>>> see if I can implement the errata fix. >>>> >>>> Thanks for your inputs. >>>> >>>> -Regards, >>>> Sanchayan. >>>> >>> >>> Hi Sanchayan, >>> >>> Please try below patch to see if it can fix your problem: >>> >>> From 602a4db2b18f4451dda5b9365f127eae88ff68fb Mon Sep 17 00:00:00 2001 >>> From: Peter Chen <peter.chen@xxxxxxxxxxxxx> >>> Date: Thu, 18 Dec 2014 12:47:36 +0800 >>> Subject: [PATCH 1/1] for 2.40a errata >>> >>> Signed-off-by: Peter Chen <peter.chen@xxxxxxxxxxxxx> >>> --- >>> drivers/usb/chipidea/udc.c | 28 +++++++++++++++++++++++++++- >>> 1 file changed, 27 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/usb/chipidea/udc.c b/drivers/usb/chipidea/udc.c >>> index c0754be..62b4625 100644 >>> --- a/drivers/usb/chipidea/udc.c >>> +++ b/drivers/usb/chipidea/udc.c >>> @@ -201,7 +201,7 @@ static int hw_ep_prime(struct ci_hdrc *ci, int num, int dir, int is_ctrl) >>> if (is_ctrl && dir == RX && hw_read(ci, OP_ENDPTSETUPSTAT, BIT(num))) >>> return -EAGAIN; >>> >>> - /* status shoult be tested according with manual but it doesn't work */ >>> + /* status should be tested according with manual but it doesn't work */ >>> return 0; >>> } >>> >>> @@ -522,6 +522,18 @@ static void free_pending_td(struct ci_hw_ep *hwep) >>> kfree(pending); >>> } >>> >>> +static int reprime_dtd(struct ci_hdrc *ci, struct ci_hw_ep *hwep, struct td_node *node) >>> +{ >>> + hwep->qh.ptr->td.next = cpu_to_le32(node->ptr->next); >>> + hwep->qh.ptr->td.token &= >>> + cpu_to_le32(~(TD_STATUS_HALTED|TD_STATUS_ACTIVE)); >>> + >>> + wmb(); /* synchronize before ep prime */ >>> + >>> + return hw_ep_prime(ci, hwep->num, hwep->dir, >>> + hwep->type == USB_ENDPOINT_XFER_CONTROL); >>> +} >>> + >>> /** >>> * _hardware_dequeue: handles a request at hardware level >>> * @gadget: gadget >>> @@ -535,6 +547,7 @@ static int _hardware_dequeue(struct ci_hw_ep *hwep, struct ci_hw_req *hwreq) >>> struct td_node *node, *tmpnode; >>> unsigned remaining_length; >>> unsigned actual = hwreq->req.length; >>> + struct ci_hdrc *ci = hwep->ci; >>> >>> if (hwreq->req.status != -EALREADY) >>> return -EINVAL; >>> @@ -582,6 +595,19 @@ static int _hardware_dequeue(struct ci_hw_ep *hwep, struct ci_hw_req *hwreq) >>> list_del_init(&node->td); >>> } >>> >>> +/* if (ci->rev == CI_REVISION_24) */ >>> + while ((hwep->qh.ptr->curr == node->dma)) { >>> + if (node->ptr->next != TD_TERMINATE) { >>> + int n = hw_ep_bit(hwep->num, hwep->dir); >>> + >>> + /* Only do re-prime when both ENDPTPRIME bit and ENDPTSTAT bit are 0 */ >>> + if (!hw_read(ci, OP_ENDPTPRIME, BIT(n)) && >>> + !hw_read(ci, OP_ENDPTSTAT, BIT(n))) >>> + reprime_dtd(ci, hwep, node); >>> + } >>> + udelay(1); >>> + } >>> + >>> usb_gadget_unmap_request(&hwep->ci->gadget, &hwreq->req, hwep->dir); >>> >>> hwreq->req.actual += actual; >>> >> >> Applied and tested the patch. Still the same condition. > > How about change condition like: > >>> + if (!hw_read(ci, OP_ENDPTSTAT, BIT(n))) >>> + reprime_dtd(ci, hwep, node); > No, this doesn't work either :(. > If you still meet issue, you may help me indicate your "condition" > detail. Can you tell me the condition details you are looking for? Transfer descriptor list logs? > One clue, Marvell SoC may has this problem too, and they > submit a patch below (In fact, I "steal" most of its idea). > > https://lkml.org/lkml/2014/2/25/43 > I was looking at that patchset when I got your patches. Due to very limited knowledge on USB as of now, it would have taken me quite a while, to figure what and where to make the changes. Thanks for your patches :). I will use them and the patchset by Marvell guys as a reference and try/test a few things out which I can, by applying them as and where relevant. If anything works out or some new results come up, will update. -Regards, Sanchayan. -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html