Re: the dreaded "needs XHCI_TRUST_TX_LENGTH quirk" returns

Matthew Hall <mhall@xxxxxxxxxxxxxxx> · Thu, 26 Jul 2012 23:36:44 -0700

On Thu, Jul 26, 2012 at 11:27:47PM -0700, Sarah Sharp wrote:
> Yeah, I was kind of banging my head on the desk as well.  I'm really not
> sure why it wasn't causing a general protection fault, so it's possible
> I could be wrong in my analysis of what the bug might do.

;-)

> > I did test this latest patch, and things are still haywire with the same error 
> > according the the fuse exfat, however there's no trace of "error" (case 
> > insensitive) except for some messages which say this again and again from the 
> > debug logic:
> > 
> > Jul 25 01:36:08 themhallbox kernel: [  601.092877] xhci_hcd 0000:03:00.0: HC error bitmask = 0x0
> 
> That message is harmless, since there's no error bits set in the
> bitmask.  Were you using a case insensitive search?  Because
> the first part of the message was "ERROR".  If you were using a case
> insensitive search, that means there's no xHCI problems, at least.

I am pretty sure, but it never hurts to have a second set of eyes on the kern 
log I posted in my second mail. In the part I posted which covers the time 
range in question, it looks like we get some more entertaining errors.

$ zfgrep -i error dmesg-usb-3-port-memory-patch-plugin.txt.gz 
Jul 26 22:53:28 themhallbox kernel: [31254.692933] xhci_hcd 0000:03:00.0: HC error bitmask = 0x0
Jul 26 22:54:29 themhallbox kernel: [31314.680519] xhci_hcd 0000:03:00.0: HC error bitmask = 0x4
Jul 26 22:55:29 themhallbox kernel: [31374.668080] xhci_hcd 0000:03:00.0: HC error bitmask = 0x4
Jul 26 22:56:29 themhallbox kernel: [31434.655636] xhci_hcd 0000:03:00.0: HC error bitmask = 0x4
Jul 26 22:57:29 themhallbox kernel: [31494.643194] xhci_hcd 0000:03:00.0: HC error bitmask = 0x4
Jul 26 22:58:29 themhallbox kernel: [31554.630756] xhci_hcd 0000:03:00.0: HC error bitmask = 0x4
Jul 26 22:59:29 themhallbox kernel: [31614.618308] xhci_hcd 0000:03:00.0: HC error bitmask = 0x4
Jul 26 23:00:30 themhallbox kernel: [31674.605863] xhci_hcd 0000:03:00.0: HC error bitmask = 0x4
Jul 26 23:01:30 themhallbox kernel: [31734.593415] xhci_hcd 0000:03:00.0: HC error bitmask = 0x4
Jul 26 23:01:52 themhallbox kernel: [31756.388465] end_request: I/O error, dev sdi, sector 32768
Jul 26 23:01:52 themhallbox kernel: [31756.388469] Buffer I/O error on device sdi1, logical block 0
Jul 26 23:01:52 themhallbox kernel: [31756.388472] lost page write due to I/O error on sdi1
Jul 26 23:02:30 themhallbox kernel: [31794.580971] xhci_hcd 0000:03:00.0: HC error bitmask = 0x4
Jul 26 23:03:30 themhallbox kernel: [31854.568518] xhci_hcd 0000:03:00.0: HC error bitmask = 0x4
Jul 26 23:04:30 themhallbox kernel: [31914.556076] xhci_hcd 0000:03:00.0: HC error bitmask = 0x4
Jul 26 23:05:17 themhallbox kernel: [31960.910771] end_request: I/O error, dev sdi, sector 32768
Jul 26 23:05:17 themhallbox kernel: [31960.910775] Buffer I/O error on device sdi1, logical block 0
Jul 26 23:05:17 themhallbox kernel: [31960.910778] lost page write due to I/O error on sdi1
Jul 26 23:05:30 themhallbox kernel: [31974.543623] xhci_hcd 0000:03:00.0: HC error bitmask = 0x4
Jul 26 23:06:31 themhallbox kernel: [32034.531186] xhci_hcd 0000:03:00.0: HC error bitmask = 0x4
Jul 26 23:07:31 themhallbox kernel: [32094.518739] xhci_hcd 0000:03:00.0: HC error bitmask = 0x4
Jul 26 23:08:31 themhallbox kernel: [32154.506303] xhci_hcd 0000:03:00.0: HC error bitmask = 0x4
Jul 26 23:09:31 themhallbox kernel: [32214.493865] xhci_hcd 0000:03:00.0: HC error bitmask = 0x4
Jul 26 23:10:31 themhallbox kernel: [32274.481423] xhci_hcd 0000:03:00.0: HC error bitmask = 0x4

> > But we are definitely still getting some kind of reactor meltdown on some kind 
> > of exfat write, probably the superblock update according to my prior code 
> > inspection...
> > 
> > Jul 26 23:01:52 themhallbox kernel: [31756.388465] end_request: I/O error, dev sdi, sector 32768
> > Jul 26 23:01:52 themhallbox kernel: [31756.388469] Buffer I/O error on device sdi1, logical block 0
> > Jul 26 23:01:52 themhallbox kernel: [31756.388472] lost page write due to I/O error on sdi1
> 
> Man, I hope my code hasn't eaten your disk.  Is there any chance you
> could replace the drive in the enclosure and create a new file system to
> test?

This part is tricky, because I only have two of these SDXC memory cards, and I 
haven't got a reliable way of formatting exfat back onto one right now to be 
sure I get a clean run.

My only Windows box is a Windows XP VirtualBox VM, because I've used Linux as 
my by-far primary OS since 2005 and main OS since 1996. I will try to see if I 
can convince XP to put a new exfat FS on there using one of the USB 2.0 ports 
and see how far I get.

> > Also last time I was having a hard time capturing this, but I snagged it this 
> > time:
> > 
> > mhall@themhallbox:~$ sudo mount /dev/sdi1 /mnt
> > FUSE exfat 0.9.7
> > ERROR: fsync failed.
> 
> Well, let me see the dmesg with CONFIG_USB_DEBUG and
> CONFIG_USB_XHCI_HCD_DEBUGGING turned on, and I'll see if this is caused
> by an xHCI error, or a filesystem error.

I attached the forgotten kern log in the follow-on mail.

> Sarah Sharp

Regards,
Matthew.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html