On Thu, Jul 26, 2012 at 11:09:13PM -0700, Matthew Hall wrote: > On Thu, Jul 26, 2012 at 01:22:15PM -0700, Sarah Sharp wrote: > > Ok, I think I have the root cause of this message, and it's a nasty > > little bug. Can you apply the attached patch (instead of the previous > > debug patch) and test? I think your disk will work once it's applied, > > but please double check that you don't see any more of that ERROR > > message. > > I agree, that's indeed a horrifying bug. I must say that my own brain had a > core dump when you wrote, "this function would quite happily read all of > system memory before wrapping around to the right pointer value." ;-) Yeah, I was kind of banging my head on the desk as well. I'm really not sure why it wasn't causing a general protection fault, so it's possible I could be wrong in my analysis of what the bug might do. > I did test this latest patch, and things are still haywire with the same error > according the the fuse exfat, however there's no trace of "error" (case > insensitive) except for some messages which say this again and again from the > debug logic: > > Jul 25 01:36:08 themhallbox kernel: [ 601.092877] xhci_hcd 0000:03:00.0: HC error bitmask = 0x0 That message is harmless, since there's no error bits set in the bitmask. Were you using a case insensitive search? Because the first part of the message was "ERROR". If you were using a case insensitive search, that means there's no xHCI problems, at least. > But we are definitely still getting some kind of reactor meltdown on some kind > of exfat write, probably the superblock update according to my prior code > inspection... > > Jul 26 23:01:52 themhallbox kernel: [31756.388465] end_request: I/O error, dev sdi, sector 32768 > Jul 26 23:01:52 themhallbox kernel: [31756.388469] Buffer I/O error on device sdi1, logical block 0 > Jul 26 23:01:52 themhallbox kernel: [31756.388472] lost page write due to I/O error on sdi1 Man, I hope my code hasn't eaten your disk. Is there any chance you could replace the drive in the enclosure and create a new file system to test? > Also last time I was having a hard time capturing this, but I snagged it this > time: > > mhall@themhallbox:~$ sudo mount /dev/sdi1 /mnt > FUSE exfat 0.9.7 > ERROR: fsync failed. Well, let me see the dmesg with CONFIG_USB_DEBUG and CONFIG_USB_XHCI_HCD_DEBUGGING turned on, and I'll see if this is caused by an xHCI error, or a filesystem error. Sarah Sharp -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html