On 03/26/2009 12:20 AM, James Bottomley wrote: > On Wed, 2009-03-25 at 02:06 -0400, Jeff Garzik wrote: >> Niel Lambrechts wrote: >>> On 03/25/2009 03:30 AM, Theodore Tso wrote: >>>> On Tue, Mar 24, 2009 at 10:25:57PM +0200, Niel Lambrechts wrote: >>>>> Hi, >>>>> >>>>> After upgrading to 2.6.29 I get the below errors after resuming from >>>>> hibernating with s2disk. I ran fsck and tried doing the same thing again >>>>> in 2.6.28.9-pae, but do not get any errors there. >>>> The ext4 errors are interleaved with hardware errors, and the ext4 >>>> errors are about I/O errors. >>>> >>>> EXT4-fs error (device sda6): __ext4_get_inode_loc: unable to read inode block - inode=2346519 >>>> EXT4-fs error (device sda6) in ext4_reserve_inode_write: IO failure >>>> >>>> This looks more like a hibernation problem than an ext4 problem. >>>> Looks like the hard drive is being left in some inconsistent state >>>> after resuming from hibernation. >>>> >>>> - Ted >>> Thanks for the info Theodore, this is definitely looks like some type of >>> regression in 2.6.29, as the problem is not evident when I s2disk using >>> 2.6.28.9, even after multiple suspend/resume cycles. >>> >>> I found some 'ATA bus errors' and 'SError' messages in >>> /var/log/messages, so I've attached the messages from both 2.6.29 and >>> 2.6.28 for comparison. >> Well, here is the interpretation of messages: >> >>> ata1.00: irq_stat 0x00400008, PHY RDY changed >>> ata1: SError: { PHYRdyChg CommWake } >> Your SATA hardware flags a connect-or-disconnect event ("PHY RDY"), >> which requires us to abort a bunch of queued commands: >> >>> ata1.00: cmd 60/18:00:77:88:6f/00:00:0e:00:00/40 tag 0 ncq 12288 in >>> res 50/00:30:07:b3:10/00:00:0c:00:00/40 Emask 0x10 (ATA bus error) >> [...] >>> ata1.00: cmd 60/30:68:07:b3:10/00:00:0c:00:00/40 tag 13 ncq 24576 in >>> res 50/00:30:07:b3:10/00:00:0c:00:00/40 Emask 0x10 (ATA bus error) >> ...through the 14th command (tag 13). >> >>> Mar 24 21:29:14 linux-7vph kernel: ata1: hard resetting link >>> Mar 24 21:29:14 linux-7vph kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) >>> Mar 24 21:29:14 linux-7vph kernel: ata1.00: ACPI cmd ef/02:00:00:00:00:a0 succeeded >>> Mar 24 21:29:14 linux-7vph kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out >>> Mar 24 21:29:14 linux-7vph kernel: ata1.00: ACPI cmd ef/5f:00:00:00:00:a0 succeeded >>> Mar 24 21:29:14 linux-7vph kernel: ata1.00: ACPI cmd ef/10:03:00:00:00:a0 filtered out >>> Mar 24 21:29:14 linux-7vph kernel: ata1.00: ACPI cmd ef/02:00:00:00:00:a0 succeeded >>> Mar 24 21:29:14 linux-7vph kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out >>> Mar 24 21:29:14 linux-7vph kernel: ata1.00: ACPI cmd ef/5f:00:00:00:00:a0 succeeded >>> Mar 24 21:29:14 linux-7vph kernel: ata1.00: ACPI cmd ef/10:03:00:00:00:a0 filtered out >>> Mar 24 21:29:14 linux-7vph kernel: ata1.00: configured for UDMA/133 >>> Mar 24 21:29:14 linux-7vph kernel: ata1.00: configured for UDMA/133 >> >> SATA link is reset, and ACPI is re-run. >> >>> Mar 24 21:29:14 linux-7vph kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK >>> Mar 24 21:29:14 linux-7vph kernel: sd 0:0:0:0: [sda] Sense Key : Aborted Command [current] [descriptor] >>> Mar 24 21:29:14 linux-7vph kernel: Descriptor sense data with sense descriptors (in hex): >>> Mar 24 21:29:14 linux-7vph kernel: 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 >>> Mar 24 21:29:14 linux-7vph kernel: 0c 10 b3 07 >>> Mar 24 21:29:14 linux-7vph kernel: sd 0:0:0:0: [sda] Add. Sense: No additional sense information >>> Mar 24 21:29:14 linux-7vph kernel: end_request: I/O error, dev sda, sector 242190455 >> The SCSI subsystem aborts each of the queued commands. > > No .. this is the SCSI subsystem receives an ABORTED COMMAND return in > sense data for each of the outstanding I/Os > > The only place these are generated is in ata_sense_to_error() which only > occurs if there's some type of ata error. > > If I had to theorise, I'd say the system suspended with commands > outstanding to the device. On resume, the device gets reset and returns > some type of ATA error which gets translated to ABORTED COMMAND which > causes a failure. > > In the mid layer, we translate ABORTED_COMMAND into a retry until the > command runs out of them ... could it be there's a race readying the > device and we run through the retries before it can accept the command? I'm seeing some dubious looking ATA messages even on 2.6.28.9-pae, although with all the 2.6.28 variants I used s2disk/resume has always worked. I was wondering if these "errors" perhaps play more of a role in 2.6.29, perhaps due to the async. changes that was mentioned? In particular, I notice some the following before "EH complete": ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x9 t4 ata1: irq_stat 0x00400040, connection status changed but before this last message there is a ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Why would the connection status change if a moment ago it was "link up"? Niel
Mar 27 20:18:01 linux-7vph -- MARK -- Mar 27 20:18:01 linux-7vph syslog-ng[32661]: Log statistics; dropped='pipe(/dev/xconsole)=0', dropped='pipe(/dev/tty10)=0', processed='center(queued)=350', processed='center(received)=284', processed='destination(newsnotice)=0', processed='destination(acpid)=8', processed='destination(firewall)=0', processed='destination(null)=61', processed='destination(mail)=0', processed='destination(mailinfo)=0', processed='destination(console)=20', processed='destination(newserr)=0', processed='destination(newscrit)=0', processed='destination(messages)=129', processed='destination(mailwarn)=0', processed='destination(localmessages)=0', processed='destination(netmgm)=86', processed='destination(mailerr)=0', processed='destination(xconsole)=20', processed='destination(warn)=26', processed='source(src)=284' Mar 27 20:18:00 linux-7vph kernel: Syncing filesystems ... done. Mar 27 20:18:01 linux-7vph kernel: Freezing user space processes ... (elapsed 0.00 seconds) done. Mar 27 20:18:01 linux-7vph kernel: Freezing remaining freezable tasks ... (elapsed 0.00 seconds) done. Mar 27 20:18:01 linux-7vph kernel: PM: Shrinking memory... done (57185 pages freed) Mar 27 20:18:01 linux-7vph kernel: PM: Freed 228740 kbytes in 2.03 seconds (112.67 MB/s) Mar 27 20:18:01 linux-7vph kernel: Suspending console(s) (use no_console_suspend to debug) Mar 27 20:18:01 linux-7vph kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache Mar 27 20:18:01 linux-7vph kernel: ACPI handle has no context! Mar 27 20:18:01 linux-7vph kernel: ehci_hcd 0000:00:1d.7: PCI INT D disabled Mar 27 20:18:01 linux-7vph kernel: uhci_hcd 0000:00:1d.2: PCI INT C disabled Mar 27 20:18:01 linux-7vph kernel: uhci_hcd 0000:00:1d.1: PCI INT B disabled Mar 27 20:18:01 linux-7vph kernel: uhci_hcd 0000:00:1d.0: PCI INT A disabled Mar 27 20:18:01 linux-7vph kernel: HDA Intel 0000:00:1b.0: PCI INT B disabled Mar 27 20:18:01 linux-7vph kernel: ehci_hcd 0000:00:1a.7: PCI INT D disabled Mar 27 20:18:01 linux-7vph kernel: uhci_hcd 0000:00:1a.2: PCI INT C disabled Mar 27 20:18:01 linux-7vph kernel: uhci_hcd 0000:00:1a.1: PCI INT B disabled Mar 27 20:18:01 linux-7vph kernel: uhci_hcd 0000:00:1a.0: PCI INT A disabled Mar 27 20:18:01 linux-7vph kernel: e1000e 0000:00:19.0: PME# enabled Mar 27 20:18:01 linux-7vph kernel: e1000e 0000:00:19.0: wake-up capability enabled by ACPI Mar 27 20:18:01 linux-7vph kernel: e1000e 0000:00:19.0: PME# enabled Mar 27 20:18:01 linux-7vph kernel: e1000e 0000:00:19.0: wake-up capability enabled by ACPI Mar 27 20:18:06 linux-7vph kernel: e1000e 0000:00:19.0: PCI INT A disabled Mar 27 20:18:06 linux-7vph kernel: ACPI: Preparing to enter system sleep state S4 Mar 27 20:18:06 linux-7vph kernel: Disabling non-boot CPUs ... Mar 27 20:18:06 linux-7vph kernel: CPU 1 is now offline Mar 27 20:18:06 linux-7vph kernel: SMP alternatives: switching to UP code Mar 27 20:18:06 linux-7vph kernel: CPU0 attaching NULL sched-domain. Mar 27 20:18:06 linux-7vph kernel: CPU1 attaching NULL sched-domain. Mar 27 20:18:06 linux-7vph kernel: CPU0 attaching NULL sched-domain. Mar 27 20:18:06 linux-7vph kernel: CPU1 is down Mar 27 20:18:06 linux-7vph kernel: Extended CMOS year: 2000 Mar 27 20:18:06 linux-7vph kernel: PM: Creating hibernation image: Mar 27 20:18:06 linux-7vph kernel: PM: Need to copy 125532 pages Mar 27 20:18:06 linux-7vph kernel: x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106 Mar 27 20:18:06 linux-7vph kernel: Intel machine check architecture supported. Mar 27 20:18:06 linux-7vph kernel: Intel machine check reporting enabled on CPU#0. Mar 27 20:18:06 linux-7vph kernel: Extended CMOS year: 2000 Mar 27 20:18:06 linux-7vph kernel: Enabling non-boot CPUs ... Mar 27 20:18:06 linux-7vph kernel: SMP alternatives: switching to SMP code Mar 27 20:18:06 linux-7vph kernel: Booting processor 1 APIC 0x1 ip 0x6000 Mar 27 20:18:06 linux-7vph kernel: Initializing CPU#1 Mar 27 20:18:06 linux-7vph kernel: Calibrating delay using timer specific routine.. 5054.03 BogoMIPS (lpj=10108064) Mar 27 20:18:06 linux-7vph kernel: CPU: L1 I cache: 32K, L1 D cache: 32K Mar 27 20:18:06 linux-7vph kernel: CPU: L2 cache: 6144K Mar 27 20:18:06 linux-7vph kernel: CPU: Physical Processor ID: 0 Mar 27 20:18:06 linux-7vph kernel: CPU: Processor Core ID: 1 Mar 27 20:18:06 linux-7vph kernel: Intel machine check architecture supported. Mar 27 20:18:06 linux-7vph kernel: Intel machine check reporting enabled on CPU#1. Mar 27 20:18:06 linux-7vph kernel: x86 PAT enabled: cpu 1, old 0x7040600070406, new 0x7010600070106 Mar 27 20:18:06 linux-7vph kernel: CPU1: Intel(R) Core(TM)2 Duo CPU T9400 @ 2.53GHz stepping 06 Mar 27 20:18:06 linux-7vph kernel: CPU0 attaching NULL sched-domain. Mar 27 20:18:06 linux-7vph kernel: CPU0 attaching sched-domain: Mar 27 20:18:06 linux-7vph kernel: domain 0: span 0-1 level MC Mar 27 20:18:06 linux-7vph kernel: groups: 0 1 Mar 27 20:18:06 linux-7vph kernel: domain 1: span 0-1 level CPU Mar 27 20:18:06 linux-7vph kernel: groups: 0-1 Mar 27 20:18:06 linux-7vph kernel: CPU1 attaching sched-domain: Mar 27 20:18:06 linux-7vph kernel: domain 0: span 0-1 level MC Mar 27 20:18:06 linux-7vph kernel: groups: 1 0 Mar 27 20:18:06 linux-7vph kernel: domain 1: span 0-1 level CPU Mar 27 20:18:06 linux-7vph kernel: groups: 0-1 Mar 27 20:18:06 linux-7vph kernel: Switched to high resolution mode on CPU 1 Mar 27 20:18:06 linux-7vph kernel: CPU1 is up Mar 27 20:18:06 linux-7vph kernel: ACPI: Waking up from system sleep state S4 Mar 27 20:18:06 linux-7vph kernel: APIC error on CPU1: 00(40) Mar 27 20:18:06 linux-7vph kernel: ACPI: EC: non-query interrupt received, switching to interrupt mode Mar 27 20:18:06 linux-7vph kernel: pci 0000:00:02.0: restoring config space at offset 0x1 (was 0x900007, writing 0x900407) Mar 27 20:18:06 linux-7vph kernel: pci 0000:00:02.0: power state changed by ACPI to D0 Mar 27 20:18:06 linux-7vph kernel: pci 0000:00:02.0: setting latency timer to 64 Mar 27 20:18:06 linux-7vph kernel: e1000e 0000:00:19.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20 Mar 27 20:18:06 linux-7vph kernel: e1000e 0000:00:19.0: setting latency timer to 64 Mar 27 20:18:06 linux-7vph kernel: e1000e 0000:00:19.0: wake-up capability disabled by ACPI Mar 27 20:18:06 linux-7vph kernel: e1000e 0000:00:19.0: PME# disabled Mar 27 20:18:06 linux-7vph kernel: e1000e 0000:00:19.0: wake-up capability disabled by ACPI Mar 27 20:18:06 linux-7vph kernel: e1000e 0000:00:19.0: PME# disabled Mar 27 20:18:06 linux-7vph kernel: e1000e 0000:00:19.0: irq 2298 for MSI/MSI-X Mar 27 20:18:06 linux-7vph kernel: uhci_hcd 0000:00:1a.0: power state changed by ACPI to D0 Mar 27 20:18:06 linux-7vph kernel: uhci_hcd 0000:00:1a.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20 Mar 27 20:18:06 linux-7vph kernel: uhci_hcd 0000:00:1a.0: setting latency timer to 64 Mar 27 20:18:06 linux-7vph kernel: uhci_hcd 0000:00:1a.1: PCI INT B -> GSI 21 (level, low) -> IRQ 21 Mar 27 20:18:06 linux-7vph kernel: uhci_hcd 0000:00:1a.1: setting latency timer to 64 Mar 27 20:18:06 linux-7vph kernel: uhci_hcd 0000:00:1a.2: power state changed by ACPI to D0 Mar 27 20:18:06 linux-7vph kernel: uhci_hcd 0000:00:1a.2: PCI INT C -> GSI 22 (level, low) -> IRQ 22 Mar 27 20:18:06 linux-7vph kernel: uhci_hcd 0000:00:1a.2: setting latency timer to 64 Mar 27 20:18:06 linux-7vph kernel: ehci_hcd 0000:00:1a.7: power state changed by ACPI to D0 Mar 27 20:18:06 linux-7vph kernel: ehci_hcd 0000:00:1a.7: PCI INT D -> GSI 23 (level, low) -> IRQ 23 Mar 27 20:18:06 linux-7vph kernel: ehci_hcd 0000:00:1a.7: setting latency timer to 64 Mar 27 20:18:06 linux-7vph kernel: HDA Intel 0000:00:1b.0: restoring config space at offset 0x1 (was 0x100106, writing 0x100102) Mar 27 20:18:06 linux-7vph kernel: HDA Intel 0000:00:1b.0: PCI INT B -> GSI 17 (level, low) -> IRQ 17 Mar 27 20:18:06 linux-7vph kernel: HDA Intel 0000:00:1b.0: setting latency timer to 64 Mar 27 20:18:06 linux-7vph kernel: pcieport-driver 0000:00:1c.0: setting latency timer to 64 Mar 27 20:18:06 linux-7vph kernel: pcieport-driver 0000:00:1c.1: setting latency timer to 64 Mar 27 20:18:06 linux-7vph kernel: pcieport-driver 0000:00:1c.3: setting latency timer to 64 Mar 27 20:18:06 linux-7vph kernel: pcieport-driver 0000:00:1c.4: setting latency timer to 64 Mar 27 20:18:06 linux-7vph kernel: uhci_hcd 0000:00:1d.0: power state changed by ACPI to D0 Mar 27 20:18:06 linux-7vph kernel: uhci_hcd 0000:00:1d.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 Mar 27 20:18:06 linux-7vph kernel: uhci_hcd 0000:00:1d.0: setting latency timer to 64 Mar 27 20:18:06 linux-7vph kernel: uhci_hcd 0000:00:1d.1: PCI INT B -> GSI 17 (level, low) -> IRQ 17 Mar 27 20:18:06 linux-7vph kernel: uhci_hcd 0000:00:1d.1: setting latency timer to 64 Mar 27 20:18:06 linux-7vph kernel: uhci_hcd 0000:00:1d.2: PCI INT C -> GSI 18 (level, low) -> IRQ 18 Mar 27 20:18:06 linux-7vph kernel: uhci_hcd 0000:00:1d.2: setting latency timer to 64 Mar 27 20:18:06 linux-7vph kernel: ehci_hcd 0000:00:1d.7: power state changed by ACPI to D0 Mar 27 20:18:06 linux-7vph kernel: ehci_hcd 0000:00:1d.7: PCI INT D -> GSI 19 (level, low) -> IRQ 19 Mar 27 20:18:06 linux-7vph kernel: ehci_hcd 0000:00:1d.7: setting latency timer to 64 Mar 27 20:18:06 linux-7vph kernel: pci 0000:00:1e.0: setting latency timer to 64 Mar 27 20:18:06 linux-7vph kernel: ahci 0000:00:1f.2: restoring config space at offset 0x1 (was 0x2b00403, writing 0x2b00407) Mar 27 20:18:06 linux-7vph kernel: ahci 0000:00:1f.2: setting latency timer to 64 Mar 27 20:18:06 linux-7vph kernel: pci 0000:15:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 Mar 27 20:18:06 linux-7vph kernel: ohci1394: fw-host0: OHCI-1394 1.1 (PCI): IRQ=[17] MMIO=[f4801000-f48017ff] Max Packet=[2048] IR/IT contexts=[4/4] Mar 27 20:18:06 linux-7vph kernel: sd 0:0:0:0: [sda] Starting disk Mar 27 20:18:06 linux-7vph kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Mar 27 20:18:06 linux-7vph kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Mar 27 20:18:06 linux-7vph kernel: ata1.00: ACPI cmd ef/02:00:00:00:00:a0 succeeded Mar 27 20:18:06 linux-7vph kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out Mar 27 20:18:06 linux-7vph kernel: ata1.00: ACPI cmd ef/5f:00:00:00:00:a0 succeeded Mar 27 20:18:06 linux-7vph kernel: ata1.00: ACPI cmd ef/10:03:00:00:00:a0 filtered out Mar 27 20:18:06 linux-7vph kernel: ata1.00: ACPI cmd ef/02:00:00:00:00:a0 succeeded Mar 27 20:18:06 linux-7vph kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out Mar 27 20:18:06 linux-7vph kernel: ata1.00: ACPI cmd ef/5f:00:00:00:00:a0 succeeded Mar 27 20:18:06 linux-7vph kernel: ata1.00: ACPI cmd ef/10:03:00:00:00:a0 filtered out Mar 27 20:18:06 linux-7vph kernel: ata1.00: configured for UDMA/133 Mar 27 20:18:06 linux-7vph kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x9 t4 Mar 27 20:18:06 linux-7vph kernel: ata1: irq_stat 0x00400040, connection status changed Mar 27 20:18:06 linux-7vph kernel: ata2.00: ACPI cmd e3/00:1f:00:00:00:a0 succeeded Mar 27 20:18:06 linux-7vph kernel: ata2.00: ACPI cmd e3/00:02:00:00:00:a0 succeeded Mar 27 20:18:06 linux-7vph kernel: ata1.00: configured for UDMA/133 Mar 27 20:18:06 linux-7vph kernel: ata1: EH complete Mar 27 20:18:06 linux-7vph kernel: sd 0:0:0:0: [sda] 390721968 512-byte hardware sectors: (200 GB/186 GiB) Mar 27 20:18:06 linux-7vph kernel: sd 0:0:0:0: [sda] Write Protect is off Mar 27 20:18:06 linux-7vph kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 Mar 27 20:18:06 linux-7vph kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Mar 27 20:18:06 linux-7vph kernel: sd 0:0:0:0: [sda] 390721968 512-byte hardware sectors: (200 GB/186 GiB) Mar 27 20:18:06 linux-7vph kernel: sd 0:0:0:0: [sda] Write Protect is off Mar 27 20:18:06 linux-7vph kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 Mar 27 20:18:06 linux-7vph kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Mar 27 20:18:06 linux-7vph kernel: ata2.00: ACPI cmd e3/00:1f:00:00:00:a0 succeeded Mar 27 20:18:06 linux-7vph kernel: ata2.00: ACPI cmd e3/00:02:00:00:00:a0 succeeded Mar 27 20:18:06 linux-7vph kernel: ata2.00: configured for UDMA/133 Mar 27 20:18:06 linux-7vph kernel: ata2: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x9 t4 Mar 27 20:18:06 linux-7vph kernel: ata2: irq_stat 0x40000001 Mar 27 20:18:06 linux-7vph kernel: ata2.00: configured for UDMA/133 Mar 27 20:18:06 linux-7vph kernel: ata2: EH complete Mar 27 20:18:06 linux-7vph kernel: pci 0000:00:02.0: power state changed by ACPI to D0 Mar 27 20:18:06 linux-7vph kernel: pci 0000:00:02.0: setting latency timer to 64 Mar 27 20:18:06 linux-7vph kernel: Restarting tasks ... done.