Re: the dreaded "needs XHCI_TRUST_TX_LENGTH quirk" returns

Sarah Sharp <sarah.a.sharp@xxxxxxxxxxxxxxx> · Thu, 26 Jul 2012 23:27:47 -0700

On Thu, Jul 26, 2012 at 11:09:13PM -0700, Matthew Hall wrote:
> On Thu, Jul 26, 2012 at 01:22:15PM -0700, Sarah Sharp wrote:
> > Ok, I think I have the root cause of this message, and it's a nasty
> > little bug.  Can you apply the attached patch (instead of the previous
> > debug patch) and test?  I think your disk will work once it's applied,
> > but please double check that you don't see any more of that ERROR
> > message.
> 
> I agree, that's indeed a horrifying bug. I must say that my own brain had a 
> core dump when you wrote, "this function would quite happily read all of 
> system memory before wrapping around to the right pointer value." ;-)

Yeah, I was kind of banging my head on the desk as well.  I'm really not
sure why it wasn't causing a general protection fault, so it's possible
I could be wrong in my analysis of what the bug might do.

> I did test this latest patch, and things are still haywire with the same error 
> according the the fuse exfat, however there's no trace of "error" (case 
> insensitive) except for some messages which say this again and again from the 
> debug logic:
> 
> Jul 25 01:36:08 themhallbox kernel: [  601.092877] xhci_hcd 0000:03:00.0: HC error bitmask = 0x0

That message is harmless, since there's no error bits set in the
bitmask.  Were you using a case insensitive search?  Because
the first part of the message was "ERROR".  If you were using a case
insensitive search, that means there's no xHCI problems, at least.

> But we are definitely still getting some kind of reactor meltdown on some kind 
> of exfat write, probably the superblock update according to my prior code 
> inspection...
> 
> Jul 26 23:01:52 themhallbox kernel: [31756.388465] end_request: I/O error, dev sdi, sector 32768
> Jul 26 23:01:52 themhallbox kernel: [31756.388469] Buffer I/O error on device sdi1, logical block 0
> Jul 26 23:01:52 themhallbox kernel: [31756.388472] lost page write due to I/O error on sdi1

Man, I hope my code hasn't eaten your disk.  Is there any chance you
could replace the drive in the enclosure and create a new file system to
test?

> Also last time I was having a hard time capturing this, but I snagged it this 
> time:
> 
> mhall@themhallbox:~$ sudo mount /dev/sdi1 /mnt
> FUSE exfat 0.9.7
> ERROR: fsync failed.

Well, let me see the dmesg with CONFIG_USB_DEBUG and
CONFIG_USB_XHCI_HCD_DEBUGGING turned on, and I'll see if this is caused
by an xHCI error, or a filesystem error.

Sarah Sharp
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html