On 03/29/2012 07:11 AM, Sarah Sharp wrote: > On Tue, Mar 27, 2012 at 07:29:23PM +0200, Martin Mokrejs wrote: >> Hi, >> I already mentioned on this list I am having some issues with my new 2.7TiB hard drive >> which I bought at about a month ago. I crashed ext3 fs two days ago on it when I filled >> up the disk with the data. Yesterday I learned Seagate released a new firmware for it. >> I flashed it from CC46 (I think) to CC4C. I created an ext4 filesystem now, just in case. >> I started copying fo teh data and two hours later I see on the console: >> >> vostro kernel: journal commit I/O error >> >> Aside from the fact I do not understand why is it coming from "JBD2" it seems it caused >> the drive to be unmounted. Is that a part of ext4 driver? >> >> Anyway, it looks the whole story started here: >> >> Mar 27 18:10:23 vostro kernel: xhci_hcd 0000:0b:00.0: WARN: transfer error on endpoint > > Which kernel version are you running? These useless warning messages > were removed in 3.3, ISTR. They're harmless. > This one is a critical error (USB transaction error) and is not removed. >> Mar 27 18:10:23 vostro kernel: usb 4-1: reset SuperSpeed USB device number 12 using xhci_hcd >> Mar 27 18:10:23 vostro kernel: xhci_hcd 0000:0b:00.0: WARN: short transfer on control ep >> Mar 27 18:10:23 vostro kernel: xhci_hcd 0000:0b:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff8800073ff540 >> Mar 27 18:10:23 vostro kernel: xhci_hcd 0000:0b:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff8800073ff580 >> Mar 27 18:10:23 vostro kernel: usb 4-1: Successful Endpoint Configure command >> Mar 27 18:10:54 vostro kernel: usb 4-1: reset SuperSpeed USB device number 12 using xhci_hcd > > The usb-storage driver decided to reset the device, for whatever reason. > You can enable CONFIG_USB_STORAGE_DEBUG to find out which command caused > it to be reset. > >> Mar 27 18:10:54 vostro kernel: xhci_hcd 0000:0b:00.0: WARN: short transfer on control ep >> Mar 27 18:10:54 vostro kernel: xhci_hcd 0000:0b:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff8800073ff540 >> Mar 27 18:10:54 vostro kernel: xhci_hcd 0000:0b:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff8800073ff580 >> Mar 27 18:10:54 vostro kernel: usb 4-1: Successful Endpoint Configure command >> Mar 27 18:11:04 vostro kernel: usb 4-1: reset SuperSpeed USB device number 12 using xhci_hcd > > And then it gets reset again, 10 seconds later. > >> Mar 27 18:11:04 vostro kernel: xhci_hcd 0000:0b:00.0: WARN: short transfer on control ep >> Mar 27 18:11:04 vostro kernel: xhci_hcd 0000:0b:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff8800073ff540 >> Mar 27 18:11:04 vostro kernel: xhci_hcd 0000:0b:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff8800073ff580 >> Mar 27 18:11:04 vostro kernel: usb 4-1: Successful Endpoint Configure command >> Mar 27 18:11:04 vostro kernel: hub 4-0:1.0: state 7 ports 2 chg 0000 evt 0002 >> Mar 27 18:11:04 vostro kernel: hub 4-0:1.0: warm reset port 1 > > I wonder why you needed to have a warm reset for the port? > I think it's because the link state is SS_INACTIVE after reset. Thanks, Andiry >> Mar 27 18:11:04 vostro kernel: hub 4-0:1.0: port 1 not warm reset yet, waiting 50ms >> Mar 27 18:11:04 vostro kernel: hub 4-0:1.0: port 1 not warm reset yet, waiting 200ms >> Mar 27 18:11:04 vostro kernel: hub 4-0:1.0: port 1 not warm reset yet, waiting 200ms > > Mmm, the warm port reset timed out, which isn't very good. Perhaps your > TI host is quirky in that respect? > >> Mar 27 18:11:05 vostro kernel: hub 4-0:1.0: state 7 ports 2 chg 0000 evt 0002 >> Mar 27 18:11:06 vostro kernel: hub 4-0:1.0: state 7 ports 2 chg 0000 evt 0002 >> Mar 27 18:11:06 vostro kernel: hub 4-0:1.0: port 1, status 02a0, change 0001, 5.0 Gb/s >> Mar 27 18:11:06 vostro kernel: usb 4-1: USB disconnect, device number 12 > > And then the device disconnected. Sounds like flaky hardware. Maybe if > you figure out what SCSI command is failing for the device, we can apply > a storage device quirk, and we could avoid having to reset the device. > >> Mar 27 18:11:06 vostro kernel: usb 4-1: unregistering device >> Mar 27 18:11:06 vostro kernel: usb 4-1: unregistering interface 4-1:1.0 >> Mar 27 18:11:09 vostro kernel: sd 17:0:0:0: Device offlined - not ready after error recovery >> Mar 27 18:11:09 vostro kernel: ice sdc1, logical block 204754273 >> Mar 27 18:11:09 vostro kernel: Buffer I/O error on device sdc1, logical block 204754274 >> Mar 27 18:11:09 vostro kernel: Buffer I/O error on device sdc1, logical block 204754275 >> Mar 27 18:11:09 vostro kernel: EXT4-fs warning (device sdc1): ext4_end_bio:244: I/O error writing to inode 51118127 (offset 30695424 size 126976 starting block 204754532) >> Mar 27 18:11:09 vostro kernel: Buffer I/O error on device sdc1, logical block 204754276 >> Mar 27 18:11:09 vostro kernel: Buffer I/O error on device sdc1, logical block 204754277 >> >> >> Would anybody please comment whether the logs from xHCI side look sane? > > It helps if you Cc me (I'm listed in MAINTAINERS as the xHCI driver > maintainer). Otherwise the linux-usb mail ends up in another folder and > can get ignored for days if I get busy. :) > >> I have attached the >> complete log. From that it looks that mkfs.ext4 caushed the 3.2.12 kernel to complain >> with "mount: page allocation failure: order:5, mode:0xd0" but that is another story. >> Now I wonder how can I prove the disk is at fault and not the USWB3.0-to-SATAII >> bridge or linux kernel. The disk is rather hot (compared to another drive of exactly same >> model number) I bought 2 days ago (and which had the newer firmware, and therefore I >> went to upgrade the older-one with a tool from seagate website). > > Can you connect the disk to a desktop box and run the same sort of > workload? Connecting it via a USB 2.0 to SATA connector might not be > helpful, because if the drive is sensitive to the speed at which it's > accessed, then the error might not show up because USB 2.0 is going to > be much slower than USB 3.0. > >> 0b:00.0 USB controller: Texas Instruments Device 8241 (rev 02) (prog-if 30 [XHCI]) >> Subsystem: Dell Device 04b3 >> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ >> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- >> Latency: 0, Cache Line Size: 64 bytes >> Interrupt: pin A routed to IRQ 16 >> Region 0: Memory at f7d00000 (64-bit, non-prefetchable) [size=64K] >> Region 2: Memory at f7d10000 (64-bit, non-prefetchable) [size=8K] >> Capabilities: [40] Power Management version 3 >> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=100mA PME(D0+,D1+,D2+,D3hot+,D3cold+) >> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- >> Capabilities: [48] MSI: Enable- Count=1/8 Maskable- 64bit+ >> Address: 0000000000000000 Data: 0000 >> Capabilities: [70] Express (v2) Endpoint, MSI 00 >> DevCap: MaxPayload 1024 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited >> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- >> DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- >> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- >> MaxPayload 128 bytes, MaxReadReq 512 bytes >> DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend- >> LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 <512ns, L1 <64us >> ClockPM+ Surprise- LLActRep- BwNot- >> LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+ >> ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt- >> LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- >> DevCap2: Completion Timeout: Not Supported, TimeoutDis+ >> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- >> LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB >> Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- >> Compliance De-emphasis: -6dB >> LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- >> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- >> Capabilities: [c0] MSI-X: Enable+ Count=8 Masked- >> Vector table: BAR=2 offset=00000000 >> PBA: BAR=2 offset=00001000 >> Capabilities: [100 v2] Advanced Error Reporting >> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- >> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- >> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- >> CESta: RxErr- BadTLP- BadDLLP+ Rollover- Timeout- NonFatalErr+ >> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ >> AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- >> Capabilities: [150 v1] Device Serial Number 08-00-28-00-00-20-00-00 >> Kernel driver in use: xhci_hcd > > Oh, I see. It's a TI host. Yeah, I haven't done much testing on that > particular host controller, so I don't know if it works well. > > Sarah Sharp > -- > To unsubscribe from this list: send the line "unsubscribe linux-usb" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html