Re: 2.6.20: reproducible hard lockup with RAID-5 resync

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Neil Brown wrote:
On Thursday February 15, bugfood-ml@xxxxxxxxxx wrote:
I think I have found an easily-reproducible bug in Linux 2.6.20. I have
already applied the "Fix various bugs with aligned reads in RAID5"
patch, and that had no effect. It appears to be related to the resync
process, and makes the system lock up, hard.

I'm guessing that the problem is at a lower level than raid.
What IDE/SATA controllers do you have?  Google to see if anyone else
has had problems with them in 2.6.20.

I have an nForce3 motherboard. lspci calls my IDE:
nVidia Corporation CK8S Parallel ATA Controller (v2.5) (rev a2)
...and my SATA:
nVidia Corporation CK8S Serial ATA Controller (v2.5) (rev a2)

I'm using libata for my SATA drives and the old IDE driver for my IDE drive. For reference, I have uploaded my kernel configuration and the output of lspci:
http://fatooh.org/files/tmp/config-2.6.20
http://fatooh.org/files/tmp/lspci-v

Anyway, I googled a bit, and I also looked through the recent threads in the linux-kernel archives, but I haven't found anything. I don't follow kernel development closely, though, so it's quite possible I missed something.

When I get home (late) tonight I'll try running dd and badblocks on the corresponding drives and partitions.

During the lock up, nothing is printed to the console, and the magic
SysRQ key has no effect; I have to poke the reset button.

Sound's like interrupts are disabled, but x86_64 always enables the
NMI watchdog which should trigger if interrupts are off for too long.

How long is "too long"? I waited a few minutes, at least, on the first few tries.

Do you have CONFIG_DETECT_SOFTLOCKUP=y in your .config (it is in the
kernel debugging options menu I think).  If not, setting that would be
worth a try.

I do indeed have CONFIG_DETECT_SOFTLOCKUP enabled. The Kconfig description says it should detect lockups > 10 seconds, I've waited longer than that many times.

A raid5 resync across 5 sata drives on a couple of different
silicon-image controllers doesn't lock up for me.

Heck. ;) Would it by any chance make a difference that I'm running RAID-5 across a mixture of drives and partitions?

Thanks again,
Corey
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux