On Sat, 19 Mar 2011 00:59:07 +0100 Xavier Brochard <xavier@xxxxxxxxxxxxxx> wrote: > Le samedi 19 mars 2011 00:20:39 NeilBrown, vous avez écrit : > > On Fri, 18 Mar 2011 23:50:18 +0100 Xavier Brochard <xavier@xxxxxxxxxxxxxx> > > > Le vendredi 18 mars 2011 23:22:51, NeilBrown écrivait : > > > > On Fri, 18 Mar 2011 21:12:49 +0100 Xavier Brochard > > > > > Le vendredi 18 mars 2011 18:22:34 hansbkk@xxxxxxxxx, vous avez écrit : > > > > > > On Fri, Mar 18, 2011 at 9:49 PM, Xavier Brochard > > > > > > <xavier@xxxxxxxxxxxxxx> > > > > > > > > > > wrote: > > > > > > > disk order is mixed between each boot - even with live-cd. > > > > > > > is that normal? > > > > > > > > > > > > If nothing is changing and the order is swapping really every boot, > > > > > > then IMO that is odd. > > > > > > > > > > nothing has changed, except kernel minor version > > > > > > > > Yet you don't tell us what the kernel minor version changed from or to. > > > > > > Previously it was ubuntu 2.6.32-27-server or 2.6.32-28-server and now it > > > is ubuntu 2.6.32-29.58-server 2.6.32.28+drm33.13 > > > > > > > That may not be important, but it might and you obviously don't know > > > > which. It is always better to give too much information rather than > > > > not enough. > > > > > > > Here's full output of mdadm --examine /dev/sd[cdefg]1 > > > As you can see, disks sdc, sdd and sde claims to be different, is it a > > > problem? > > > > Where all of these outputs collected at the same time? > > yes > > > They seem > > inconsistent. > > > In particular, sdc1 has a higher 'events' number than the others (154 vs > > 102) yet an earlier Update Time. It also thinks that the array is > > completely failed. > > When I removed that disk (sdc is number 2) and another one (I tried with > different disks), all other disks display (with mdadm -E): > 0 Active > 1 Active > 2 Active > 3 Active > 4 Spare > > But when I removed that disk (#2) and #0, it start to recover and all other > disks display (with mdadm -E): > 0 Removed > 1 Active > 2 Faulty removed > 3 Active > 4 Spare > That looks coherent for me, now. > > > So I suspect that device is badly confused and you probably want to zero > > it's metadata ... but don't do that too hastily. > > > > All the other devices think the array is working correctly with a full > > compliment of devices. However there is no device which claims to > > be "RaidDevice 2" - except sdc1 and it is obviously confused.. > > > > The device name listed in the table at the end of --examine output. > > It is the name that the device had when the metadata was last written. And > > device names can change on reboot. > > The fact that the names don't line up suggest that the metadata hasn't been > > written since the last reboot - so presumably you aren't really using the > > array.(???) > > > > [the newer 1.x metadata format doesn't try to record the names of devices > > in the superblock so it doesn't result in some of this confusion). > > > > > > Based on your earlier email, it would appear that the device discovery for > > some of your devices is happening in parallel at boot time, so or ordering > > could be random - each time you boot you get a different order. This will > > not confuse md or mdadm - they look at the content of the devices rather > > than the name. > > If you want a definitive name for each device, it might be a good idea to > > look in /dev/disk/by-path or /dev/disk/by-id and use names from there. > > > > Could you please sent a complete output of: > > > > cat /proc/mdstat > > mdadm -D /dev/md0 > > mdadm -E /dev/sd?1 > > > > all collected at the same time. Then I will suggest if there is any action > > you should take to repair anything. > > Here it is, thankyou for you help > I suggest you: mdadm --zero /dev/sdb1 having first double-checked that sdb1 is the devices with Events of 154, then mdadm -S /dev/md0 mdadm -As /dev/md0 and let the array rebuild the spare. Then check the data and make sure it is all good. Then add /dev/sdb1 back in as the spare mdadm /dev/md0 --add /dev/sdb1 and everything should be fine - providing you don't hit any hardware errors etc. NeilBrown > mdstat: > ===== > Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] > [raid10] > md0 : inactive sdb1[2](S) sdf1[4](S) sdd1[3](S) sdc1[1](S) sde1[0](S) > 2441919680 blocks > > unused devices: <none> > ==== > obviously, mdadm -D /dev/md0 output nothing > > mdadm -E /dev/sd?1 > ==== > /dev/sdb1: > Magic : a92b4efc > Version : 0.90.00 > UUID : b784237b:5a021f4d:4cf004e3:2cb521cf > Creation Time : Sun Jan 2 16:41:45 2011 > Raid Level : raid10 > Used Dev Size : 488383936 (465.76 GiB 500.11 GB) > Array Size : 976767872 (931.52 GiB 1000.21 GB) > Raid Devices : 4 > Total Devices : 5 > Preferred Minor : 0 > > Update Time : Wed Mar 16 09:50:03 2011 > State : clean > Active Devices : 1 > Working Devices : 1 > Failed Devices : 2 > Spare Devices : 0 > Checksum : ec151590 - correct > Events : 154 > > Layout : near=2 > Chunk Size : 64K > > Number Major Minor RaidDevice State > this 2 8 65 2 active sync /dev/sde1 > > 0 0 0 0 0 removed > 1 1 0 0 1 faulty removed > 2 2 8 65 2 active sync /dev/sde1 > 3 3 0 0 3 faulty removed > /dev/sdc1: > Magic : a92b4efc > Version : 0.90.00 > UUID : b784237b:5a021f4d:4cf004e3:2cb521cf > Creation Time : Sun Jan 2 16:41:45 2011 > Raid Level : raid10 > Used Dev Size : 488383936 (465.76 GiB 500.11 GB) > Array Size : 976767872 (931.52 GiB 1000.21 GB) > Raid Devices : 4 > Total Devices : 3 > Preferred Minor : 0 > > Update Time : Fri Mar 18 16:37:45 2011 > State : clean > Active Devices : 2 > Working Devices : 3 > Failed Devices : 1 > Spare Devices : 1 > Checksum : ec181672 - correct > Events : 107 > > Layout : near=2 > Chunk Size : 64K > > Number Major Minor RaidDevice State > this 1 8 17 1 active sync /dev/sdb1 > > 0 0 0 0 0 removed > 1 1 8 17 1 active sync /dev/sdb1 > 2 2 0 0 2 faulty removed > 3 3 8 49 3 active sync /dev/sdd1 > 4 4 8 33 4 spare /dev/sdc1 > /dev/sdd1: > Magic : a92b4efc > Version : 0.90.00 > UUID : b784237b:5a021f4d:4cf004e3:2cb521cf > Creation Time : Sun Jan 2 16:41:45 2011 > Raid Level : raid10 > Used Dev Size : 488383936 (465.76 GiB 500.11 GB) > Array Size : 976767872 (931.52 GiB 1000.21 GB) > Raid Devices : 4 > Total Devices : 3 > Preferred Minor : 0 > > Update Time : Fri Mar 18 16:37:45 2011 > State : clean > Active Devices : 2 > Working Devices : 3 > Failed Devices : 1 > Spare Devices : 1 > Checksum : ec181696 - correct > Events : 107 > > Layout : near=2 > Chunk Size : 64K > > Number Major Minor RaidDevice State > this 3 8 49 3 active sync /dev/sdd1 > > 0 0 0 0 0 removed > 1 1 8 17 1 active sync /dev/sdb1 > 2 2 0 0 2 faulty removed > 3 3 8 49 3 active sync /dev/sdd1 > 4 4 8 33 4 spare /dev/sdc1 > /dev/sde1: > Magic : a92b4efc > Version : 0.90.00 > UUID : b784237b:5a021f4d:4cf004e3:2cb521cf > Creation Time : Sun Jan 2 16:41:45 2011 > Raid Level : raid10 > Used Dev Size : 488383936 (465.76 GiB 500.11 GB) > Array Size : 976767872 (931.52 GiB 1000.21 GB) > Raid Devices : 4 > Total Devices : 5 > Preferred Minor : 0 > > Update Time : Wed Mar 16 07:43:45 2011 > State : clean > Active Devices : 4 > Working Devices : 5 > Failed Devices : 0 > Spare Devices : 1 > Checksum : ec14f740 - correct > Events : 102 > > Layout : near=2 > Chunk Size : 64K > > Number Major Minor RaidDevice State > this 0 8 33 0 active sync /dev/sdc1 > > 0 0 8 33 0 active sync /dev/sdc1 > 1 1 8 49 1 active sync /dev/sdd1 > 2 2 8 65 2 active sync /dev/sde1 > 3 3 8 81 3 active sync /dev/sdf1 > 4 4 8 97 4 spare > /dev/sdf1: > Magic : a92b4efc > Version : 0.90.00 > UUID : b784237b:5a021f4d:4cf004e3:2cb521cf > Creation Time : Sun Jan 2 16:41:45 2011 > Raid Level : raid10 > Used Dev Size : 488383936 (465.76 GiB 500.11 GB) > Array Size : 976767872 (931.52 GiB 1000.21 GB) > Raid Devices : 4 > Total Devices : 3 > Preferred Minor : 0 > > Update Time : Fri Mar 18 16:37:45 2011 > State : clean > Active Devices : 2 > Working Devices : 3 > Failed Devices : 1 > Spare Devices : 1 > Checksum : ec181682 - correct > Events : 107 > > Layout : near=2 > Chunk Size : 64K > > Number Major Minor RaidDevice State > this 4 8 33 4 spare /dev/sdc1 > > 0 0 0 0 0 removed > 1 1 8 17 1 active sync /dev/sdb1 > 2 2 0 0 2 faulty removed > 3 3 8 49 3 active sync /dev/sdd1 > 4 4 8 33 4 spare /dev/sdc1 > ==== > > > > Xavier > xavier@xxxxxxxxxxxxxx - 09 54 06 16 26 -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html