Hi Neil! Neil Brown wrote: > On Wednesday December 10, torarnv@xxxxxxxxx wrote: >> I have a very strange problem that I've been trying to debug for >> days now. I had a RAID5 with four drives and one spare, >> /dev/sd[bcde]1 + /dev/sdf1, and everything was working fine, until >> one day one of the drives in the array (sdb) no longer had a >> partition (sdb1). Letting the spare take over I ignored this for a >> few days, but then it happened again, this time with sdc1. >> I'm hoping someone on this list may have ran into this before, or >> have any tips on how I can continue debugging this, because I have to >> admit I'm a little lost... > > Yes, it does sound rather weird. First of all, thank you so much for helping me out with this, as I'm still very lost :) In addition to the things listed in the first e-mail, I've also tried installing the latest kernel from kernel.org, but that did not solve anything. Also, in case it's relevant, I'm running openSUSE 10.3. > Can you: > > mdadm -Esv http://pastebin.com/d7b14d14e For some reason it seems to think that /dev/sdc and /dev/sdb are part of the array, while it really is /dev/sdc1 and /dev/sdb1. I'm guessing since they are missing somehow from the device nodes in /dev mdadm assumes the disk itself is the member? > and > mdadm --stop /dev/md0 > strace -o /tmp/str -s 200 mdadm --assemble --scan --verbose /dev/md0 http://pastebin.com/f2c1db2e4 The original array had sd[bcde]1 + sdf1 as spare. Then sdb1 went missing and the spare kicked in, and then sdc1 went missing, leaving me with a degraded array. > Also the contents of /etc/mdadm.conf might help. http://pastebin.com/f573346ef Is there anything else I can run, cat, and/or paste that would shed light over what's going on? > Thanks, Thank _you_ :) Tor Arne >> raid support in. The symptoms are: >> >> - The kernel seems to detect the partitions (lines 396 and 407 in the >> dmesg [1]) >> >> - But once the boot process finishes and the RAID is started, there is >> no longer any sdc1 or sdb1, so the RAID fails to start (lines 550-576 in >> dmesg [1]) >> >> - Running fdisk -l shows that the drives in question (sdb and sdc) do >> have similar partitions as the other working drives, namely one Linux >> RAID autodetect partition each (see command output [2]) >> >> - But, the partitions are missing from /proc/partitions (see [3]) >> >> - Manually adding device nodes using mknod works, but doing file -sL >> on the device gives "writable, no read permission", even though >> permissions are the same as the other sd* nodes in /dev >> >> - Running 'partprobe -s' successfully finds the two missing partitions >> and adds device nodes, and the nodes can be 'file -sL'ed, but when >> trying to assemble the array again with these new nodes in the system, >> I'm told that sdc1 is not found, and after the --assemble is done, the >> device nodes are once again missing (!) see [4] >> >> - I've tried using the 'dmraid' command to look for fakeraid >> partitions or meta data on the drives, which I was told could mess up >> the auto-detection of Linux software ride partitions, but could not find >> any issues. >> >> >> As you can tell I've exhausted all my current options, so any help on >> what I could try next would be very much appreciated. I am especially >> curious as to why I lose the partitions when mdadm tries to assemble the >> array? >> >> Thanks! >> >> Tor Arne Vestbø >> >> [1] http://pastebin.com/m15b9c275 dmesg >> [2] http://pastebin.com/f50fb323a fdisk -l >> [3] http://pastebin.com/f4547c2ca cat /proc/partitions >> [4] http://pastebin.com/m4475c9ae partprobe + mdadm --assemble >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html