Le samedi 19 mars 2011 00:20:39, NeilBrown écrivait : > On Fri, 18 Mar 2011 23:50:18 +0100 Xavier Brochard <xavier@xxxxxxxxxxxxxx> > > Le vendredi 18 mars 2011 23:22:51, NeilBrown écrivait : > > > On Fri, 18 Mar 2011 21:12:49 +0100 Xavier Brochard > > > > Le vendredi 18 mars 2011 18:22:34 hansbkk@xxxxxxxxx, vous avez écrit : > > > > > On Fri, Mar 18, 2011 at 9:49 PM, Xavier Brochard > > > > > > disk order is mixed between each boot - even with live-cd. > > > > > > is that normal? > > > > > > > > > > If nothing is changing and the order is swapping really every boot, > > > > > then IMO that is odd. > > > > > > > > nothing has changed, except kernel minor version > > > > > > Yet you don't tell us what the kernel minor version changed from or to. > > > > Previously it was ubuntu 2.6.32-27-server or 2.6.32-28-server and now it > > is ubuntu 2.6.32-29.58-server 2.6.32.28+drm33.13 > > > > > That may not be important, but it might and you obviously don't know > > > which. It is always better to give too much information rather than > > > not enough. > > Here's full output of mdadm --examine /dev/sd[cdefg]1 > > As you can see, disks sdc, sdd and sde claims to be different, is it a > > problem? > > Where all of these outputs collected at the same time? They seem > inconsistent. > > In particular, sdc1 has a higher 'events' number than the others (154 vs > 102) yet an earlier Update Time. It also thinks that the array is > completely failed. > So I suspect that device is badly confused and you probably want to zero > it's metadata ... but don't do that too hastily. > > All the other devices think the array is working correctly with a full > compliment of devices. However there is no device which claims to > be "RaidDevice 2" - except sdc1 and it is obviously confused.. > > The device name listed in the table at the end of --examine output. > It is the name that the device had when the metadata was last written. And > device names can change on reboot. > The fact that the names don't line up suggest that the metadata hasn't been > written since the last reboot - so presumably you aren't really using the > array.(???) The array was in use 24/24. But the last reboot using it was after the first error (I described it extensively in wednesday email). As I first thought it was a file system error, I've launched fsck to check the /tmp FS with fsck /dev/mapper/tout-tmp (it is a Raid10 + lvm setup). Can it be the reason for the metadata not written? > [the newer 1.x metadata format doesn't try to record the names of devices > in the superblock so it doesn't result in some of this confusion). Yes it's really confusing: the SAS/SATA controler card gives "numbers" for the hard drives which doesn't correspond to the /dev/sd? names which doesn't correspond to the drive numer in the array etc. > Based on your earlier email, it would appear that the device discovery for > some of your devices is happening in parallel at boot time, so or ordering > could be random - each time you boot you get a different order. This will > not confuse md or mdadm - they look at the content of the devices rather > than the name. ok, thanks for making it very clear > If you want a definitive name for each device, it might be a good idea to > look in /dev/disk/by-path or /dev/disk/by-id and use names from there. I think I can't: With System Rescue CD (2.6.35-std163-amd64 kernel) I have only one path available: pci-0000:00:14.1-scsi-0:0:0:0 which, according to lspci, is not the LSI sas/sata controler, but the IDE interface: 00:14.1 IDE interface: ATI Technologies Inc SB700/SB800 IDE Controller While the LSI controler is at 01:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 02) This make me a bit anxious to start the raid recovery! Xavier xavier@xxxxxxxxxxxxxx - 09 54 06 16 26 -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html