Re: problems with usb 3.0 on clevo p150hm (NEC Corporation uPD720200)

Andiry Xu <andiry.xu@xxxxxxx> · Wed, 16 Nov 2011 15:45:43 +0800

On 11/16/2011 03:14 PM, Julian Sikorski wrote:
> W dniu 16.11.2011 07:43, Andiry Xu pisze:
>> On 11/16/2011 02:28 AM, Julian Sikorski wrote:
>>> W dniu 15.11.2011 12:55, Julian Sikorski pisze:
>>>> W dniu 15.11.2011 09:59, Andiry Xu pisze:
>>>>> On 11/15/2011 04:36 PM, Julian Sikorski wrote:
>>>>>> W dniu 15.11.2011 07:42, Andiry Xu pisze:
>>>>>>> Please keep usb mail list CCed.
>>>>>>
>>>>>> I thought I am, but I guess something went wrong with gmane. Did you get
>>>>>> my other messages btw?
>>>>>>
>>>>>>>
>>>>>>> On 11/14/2011 08:40 PM, Julian Sikorski wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> W dniu 14.11.2011 10:27, Julian Sikorski pisze:
>>>>>>>>> W dniu 2011-11-14 10:24, Andiry Xu pisze:
>>>>>>>>>> On 11/12/2011 02:20 PM, Julian Sikorski wrote:
>>>>>>>>>>> W dniu 11.11.2011 21:11, Julian Sikorski pisze:
>>>>>>>>>>>> W dniu 31.08.2011 22:25, Julian Sikorski pisze:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I originally reported this problem here:
>>>>>>>>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=35212
>>>>>>>>>>>>> Summing up, the external hard drive will produce an io-error after
>>>>>>>>>>>>> about
>>>>>>>>>>>>> 30 minutes of being connected. After such event, system does not
>>>>>>>>>>>>> notice
>>>>>>>>>>>>> if the device is re-connected upon disconnection. I am attaching the
>>>>>>>>>>>>> relevant portion of /var/log/messages.
>>>>>>>>>>>>> The problem happens on an up-to-date Fedora 15 x86_64 (running
>>>>>>>>>>>>> kernel
>>>>>>>>>>>>> 2.6.40.3-0.fc15.x86_64) on a Clevo P150HM laptop and Lacie Rugged
>>>>>>>>>>>>> USB
>>>>>>>>>>>>> 3.0 hard disk.
>>>>>>>>>>>>> Please let me know if I can provide more information
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Julian
>>>>>>>>>>>> I have recently upgraded to Fedora 16, and I am now running kernel
>>>>>>>>>>>> 3.1.0-7.fc16.x86_64. This problem is far from being gone,
>>>>>>>>>>>> unfortunately.
>>>>>>>>>>>> It will go as follows:
>>>>>>>>>>>> - you plug the drive into one of USB3 ports
>>>>>>>>>>>> - everything works fine
>>>>>>>>>>>> - suspend and resume (not sure if this is necessary)
>>>>>>>>>>
>>>>>>>>>> Have you figured out if this suspend/resume step is necessary?
>>>>>>>>>
>>>>>>>>> Not yet. I will try later today (but bear with me given the hour needed
>>>>>>>>> to trigger the problem).
>>>>>>>>>
>>>>>>>>
>>>>>>>> Due to long time needed to reproduce the problem please accept this
>>>>>>>> partial report.
>>>>>>>> I have updated the kernel to 3.1.1-1.fc16.x86_64 (which showed up in
>>>>>>>> Fedora repositories earlier today). I then rebooted and am now running
>>>>>>>> with the drive connected for more than an hour downloading something off
>>>>>>>> bittorrent. Here is the fragment of /var/log/messages from the drive
>>>>>>>> connection until now. Keep in mind that these "stalled endpoint"
>>>>>>>> messages show up every 30 minutes (12:26, 12:56 and 13:26) - maybe if
>>>>>>>> the PC was suspended before they trigger the error?
>>>>>>>>
>>>>>>>
>>>>>>> Stalled endpoint message is normal. A reset endpoint command should
>>>>>>> bring it back into normal state.
>>>>>>>
>>>>>>> Do you connect other full speed devices to USB3 ports?
>>>>>>
>>>>>> Normally not since this machine has 2 USB3 ports and 3 USB2 ports. I can
>>>>>> try and see if the problem can also be reproduced if you would like me to.
>>>>>>
>>>>>
>>>>> You don't need to try that. The device first connected as a full speed
>>>>> device and fail to initialize, and then recognise as super speed device.
>>>>> Sounds like a device issue.
>>>>>
>>>>> Anyway, try the patch attached, do a suspend/resume and see if it still
>>>>> occur.
>>>>>
>>>>> Thanks,
>>>>> Andiry
>>>>>
>>>>>
>>>>>
>>>> With your patch, I was able to do the following:
>>>> 10:40    plugged in.
>>>> 10:42    suspend/resume
>>>> 11:38    disconnect/reconnect
>>>> 12:40    tried to unmount, says device busy, worked slightly later
>>>> 12:44    disconnect/reconnect
>>>> The drive is still alive and kicking, so it seems like the patch might
>>>> be working. Let's not get ahead of ourselves though, I'll keep an eye on
>>>> it for the rest of the day. I am attaching /var/log/messages of the
>>>> session in case there might be something interesting in it. Thanks again
>>>> for looking into this.
>>>>
>>>> Regards,
>>>> Julian
>>>>
>>> I think we are looking good. I was trying various combinations of
>>> suspending, resuming, disconnecting and reconnecting and I haven't
>>> managed to break it so far. The only hiccup was when after one
>>> suspend/resume with drive disconnected the port which was in use before
>>> was not working. The other port was, and another suspend/resume brought
>>> the order back. /var/log/messages of the whole session is attached.
>>>
>>
>> OK, so your host may need a reset-on-resume quirk, though I wonder why
>> it works for a period of time after resume and then break.
>>
>> Please provide the PCI vendor and device ID by post the output of 'lspci
>> -n'.
>>
>> Thanks.
>> Andiry
>>
>>
> 
> Here you go:
> 
> $ lspci -n
> 00:00.0 0600: 8086:0104 (rev 09)
> 00:01.0 0604: 8086:0101 (rev 09)
> 00:16.0 0780: 8086:1c3a (rev 04)
> 00:1a.0 0c03: 8086:1c2d (rev 05)
> 00:1b.0 0403: 8086:1c20 (rev 05)
> 00:1c.0 0604: 8086:1c10 (rev b5)
> 00:1c.1 0604: 8086:1c12 (rev b5)
> 00:1c.2 0604: 8086:1c14 (rev b5)
> 00:1c.3 0604: 8086:1c16 (rev b5)
> 00:1d.0 0c03: 8086:1c26 (rev 05)
> 00:1f.0 0601: 8086:1c49 (rev 05)
> 00:1f.2 0106: 8086:1c03 (rev 05)
> 00:1f.3 0c05: 8086:1c22 (rev 05)
> 01:00.0 0300: 10de:0e31 (rev a1)
> 01:00.1 0403: 10de:0beb (rev a1)
> 02:00.0 0c03: 1033:0194 (rev 03)
> 03:00.0 0200: 197b:0250 (rev 05)
> 03:00.1 0880: 197b:2392 (rev 90)
> 03:00.2 0805: 197b:2391 (rev 90)
> 03:00.3 0880: 197b:2393 (rev 90)
> 04:00.0 0280: 8086:0091 (rev 34)
> 05:00.0 0c00: 197b:2380
> 

Thanks. Please remove the last patch and apply this one, and run the
test again.

Thanks,
Andiry
>From cf29644fa22c5652cb95752aa09879e47c4e58e8 Mon Sep 17 00:00:00 2001
From: Andiry Xu <andiry.xu@xxxxxxx>
Date: Wed, 16 Nov 2011 15:40:42 +0800
Subject: [PATCH] xHCI: reset-on-resume quirk for NEC uPD720200

Julian Sikorski reports NEC uPD720200 does not work stable after suspend
and resume. Re-initialize the host in xhci_resume().

Reported-by: Julian Sikorski <belegdol@xxxxxxxxx>
Signed-off-by: Andiry Xu <andiry.xu@xxxxxxx>
---
 drivers/usb/host/xhci-pci.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
index 9f51f88..f0ef354 100644
--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -32,6 +32,8 @@
 #define PCI_VENDOR_ID_ETRON		0x1b6f
 #define PCI_DEVICE_ID_ASROCK_P67	0x7023
 
+#define PCI_DEVICE_ID_NEC_uPD720200	0x0194
+
 static const char hcd_name[] = "xhci_hcd";
 
 /* called after powerup, by probe or system-pm "wakeup" */
@@ -73,8 +75,11 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
 				pdev->revision);
 	}
 
-	if (pdev->vendor == PCI_VENDOR_ID_NEC)
+	if (pdev->vendor == PCI_VENDOR_ID_NEC) {
 		xhci->quirks |= XHCI_NEC_HOST;
+		if (pdev->device == PCI_DEVICE_ID_NEC_uPD720200)
+			xhci->quirks |= XHCI_RESET_ON_RESUME;
+	}
 
 	if (pdev->vendor == PCI_VENDOR_ID_AMD && xhci->hci_version == 0x96)
 		xhci->quirks |= XHCI_AMD_0x96_HOST;
-- 
1.7.4.1