Seems like the fact that another resync is required at the time the raid array is stopped means that the array will be marked dirty. In the case of Raid 6, is that really the desired state? i.e. should the array be stopped from running upon assembling because of the spare? Still looking at the code. Perhaps there's not enough information to know that it's ok to start raid? On Wed, Jan 23, 2013 at 11:50 AM, John Gehring <john.gehring@xxxxxxxxx> wrote: > I think I'm getting closer to understanding the issue, but still have > some questions about the various states of the raid array. Ultimately, > the 'assemble' command is resulting in the un-started state (not > enough to start the array while not clean) because the array state > does not include the 'clean' condition. What I've noticed is that > after removing a device and prior to adding a device back to the > array, the array state is: 'clean, degraded, resyncing'. But after a > device is added back to the array, the state moves to: 'active, > degraded, resyncing' (no longer clean!). At this point, if the array > is stopped and then re-assembled, the array will not start. > > Is there a good explanation for why the 'clean' state does not exist > after adding a device back to the array? > > Thanks. > > > After removing a device from the array: > ------------------------------------------------------------------------------------------------------ > mdadm-3.2.6$ sudo mdadm -D /dev/md1 > /dev/md1: > Version : 1.2 > Creation Time : Wed Jan 23 11:06:45 2013 > Raid Level : raid6 > Array Size : 1503744 (1468.75 MiB 1539.83 MB) > Used Dev Size : 250624 (244.79 MiB 256.64 MB) > Raid Devices : 8 > Total Devices : 7 > Persistence : Superblock is persistent > > Update Time : Wed Jan 23 11:07:06 2013 > State : clean, degraded, resyncing > Active Devices : 7 > Working Devices : 7 > Failed Devices : 0 > Spare Devices : 0 > > Layout : left-symmetric > Chunk Size : 256K > > Resync Status : 26% complete > > Name : JLG-NexGenStorage:1 (local to host JLG-NexGenStorage) > UUID : 0100e727:8d91a5d9:67f0be9e:26be5623 > Events : 8 > > Number Major Minor RaidDevice State > 0 8 16 0 active sync /dev/sdb > 1 8 32 1 active sync /dev/sdc > 2 8 48 2 active sync /dev/sdd > 3 8 64 3 active sync /dev/sde > 4 0 0 4 removed > 5 8 96 5 active sync /dev/sdg > 6 8 112 6 active sync /dev/sdh > 7 8 128 7 active sync /dev/sdi > > > > After adding a device back to the array: > ------------------------------------------------------------------------------------------------------ > > mdadm-3.2.6$ sudo mdadm -D /dev/md1 > /dev/md1: > Version : 1.2 > Creation Time : Wed Jan 23 11:06:45 2013 > Raid Level : raid6 > Array Size : 1503744 (1468.75 MiB 1539.83 MB) > Used Dev Size : 250624 (244.79 MiB 256.64 MB) > Raid Devices : 8 > Total Devices : 8 > Persistence : Superblock is persistent > > Update Time : Wed Jan 23 11:07:27 2013 > State : active, degraded, resyncing > Active Devices : 7 > Working Devices : 8 > Failed Devices : 0 > Spare Devices : 1 > > Layout : left-symmetric > Chunk Size : 256K > > Resync Status : 52% complete > > Name : JLG-NexGenStorage:1 (local to host JLG-NexGenStorage) > UUID : 0100e727:8d91a5d9:67f0be9e:26be5623 > Events : 14 > > Number Major Minor RaidDevice State > 0 8 16 0 active sync /dev/sdb > 1 8 32 1 active sync /dev/sdc > 2 8 48 2 active sync /dev/sdd > 3 8 64 3 active sync /dev/sde > 4 0 0 4 removed > 5 8 96 5 active sync /dev/sdg > 6 8 112 6 active sync /dev/sdh > 7 8 128 7 active sync /dev/sdi > > 8 8 80 - spare /dev/sdf > > On Fri, Jan 18, 2013 at 6:37 PM, John Gehring <john.gehring@xxxxxxxxx> wrote: >> I executed the assemble command with the verbose option and saw this: >> >> ~$ sudo mdadm --verbose --assemble /dev/md1 >> --uuid=0100e727:8d91a5d9:67f0be9e:26be5623 >> mdadm: looking for devices for /dev/md1 >> mdadm: no RAID superblock on /dev/sda5 >> mdadm: no RAID superblock on /dev/sda2 >> mdadm: no RAID superblock on /dev/sda1 >> mdadm: no RAID superblock on /dev/sda >> mdadm: /dev/sdf is identified as a member of /dev/md1, slot -1. >> mdadm: /dev/sdm is identified as a member of /dev/md1, slot 7. >> mdadm: /dev/sdh is identified as a member of /dev/md1, slot 6. >> mdadm: /dev/sdg is identified as a member of /dev/md1, slot 5. >> mdadm: /dev/sde is identified as a member of /dev/md1, slot 3. >> mdadm: /dev/sdd is identified as a member of /dev/md1, slot 2. >> mdadm: /dev/sdc is identified as a member of /dev/md1, slot 1. >> mdadm: /dev/sdb is identified as a member of /dev/md1, slot 0. >> mdadm: added /dev/sdc to /dev/md1 as 1 >> mdadm: added /dev/sdd to /dev/md1 as 2 >> mdadm: added /dev/sde to /dev/md1 as 3 >> mdadm: no uptodate device for slot 4 of /dev/md1 >> mdadm: added /dev/sdg to /dev/md1 as 5 >> mdadm: added /dev/sdh to /dev/md1 as 6 >> mdadm: added /dev/sdm to /dev/md1 as 7 >> mdadm: failed to add /dev/sdf to /dev/md1: Device or resource busy >> mdadm: added /dev/sdb to /dev/md1 as 0 >> mdadm: /dev/md1 assembled from 7 drives - not enough to start the >> array while not clean - consider --force. >> >> This made me think that the zero-superblock command was not clearing >> out data as well as I expected. (BTW, I re-ran the test and ran the >> zero-superblock multiple times to get the 'mdadm: Unrecognised md >> component device - /dev/sdf' response, but still ended up with the >> assemble error.) Given that it looked to mdadm like the device still >> had belonged to the raid array, I dd'd zero's into the device between >> steps 8 and 9 (after running the zero-superblock command; probably >> redundant) and this seems to have done the trick. If I zero out the >> device (and I'm sure I can actually zero out more specific parts >> related to the superblock area), then the final assemble command works >> as desired. >> >> Still wouldn't mind hearing back about why this fails when I only take >> the steps outlined in the message above. >> >> Thanks. >> >> On Thu, Jan 17, 2013 at 7:43 PM, John Gehring <john.gehring@xxxxxxxxx> wrote: >>> I am receiving the following error when trying to assemble a raid set: >>> >>> mdadm: /dev/md1 assembled from 7 drives - not enough to start the >>> array while not clean - consider --force. >>> >>> My machine environment and the steps are listed below. I'm happy to >>> provide additional information. >>> >>> I have used the following steps to reliably reproduce the problem: >>> >>> 1 - echo "AUTO -all" >> /etc/mdadm.conf : Do this in order to >>> prevent auto assembly in a later step. >>> >>> 2 - mdadm --create /dev/md1 --level=6 --chunk=256 --raid-devices=8 >>> --uuid=0100e727:8d91a5d9:67f0be9e:26be5623 /dev/sdb /dev/sdc /dev/sdd >>> /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdm >>> - I originally detected this problem on a system with a 16 drive >>> LSI sas back plane, but found I could create a similar 8-device array >>> with a couple of 4-port USB hubs. >>> >>> 3 - Pull a drive from the raid set. This should be done prior to raid >>> finishing the resync process. If you're using > 1 G USB devices, there >>> should be ample time. >>> - sudo bash -c "/bin/echo -n 1 > /sys/block/sdf/device/delete" >>> >>> 4 - Inspect the raid status to be sure that the device is now marked as faulty. >>> - mdadm -D /dev/md1 >>> >>> 5 - Remove the 'faulty' device from the raid set. Note that upon >>> inspection of the raid data in the last step, you can see that the >>> device name of the faulty device is not given. >>> - mdadm --manage /dev/md1 --remove faulty >>> >>> 6 - Stop the raid device. >>> - mdadm -S /dev/md1 >>> >>> 7 - Rediscover the 'pulled' USB device. Note that I'm doing a virtual >>> pull and insert of the USB device because I don't have to run the risk >>> of bumping/reseating other USB devices on the same HUB. >>> - sudo bash -c "/bin/echo -n \"- - -\" > /sys/class/scsi_host/host23/scan" >>> - This step can be a little tricky because there are a good number >>> of hostx devices in the /sys/class/scsi_host directory. You have to >>> know how they are mapped or keep trying the command with different >>> hostx dirs specified until your USB device shows back up in the /dev/ >>> directory. >>> >>> 8 - 'zero' the superblock on the newly discovered device. >>> - mdadm --zero-superblock /dev/sdf >>> >>> 9 - Try to assemble the raid set. >>> - mdadm --assemble /dev/md1 --uuid=0100e727:8d91a5d9:67f0be9e:26be5623 >>> >>> results in => mdadm: /dev/md1 assembled from 7 drives - not enough to >>> start the array while not clean - consider --force. >>> >>> Using the --force switch works, but I'm not confident that the >>> integrity of the raid array has been maintained. >>> >>> My system: >>> >>> HP EliteBook 8740w >>> ~$ cat /etc/issue >>> Ubuntu 11.04 \n \l >>> >>> ~$ uname -a >>> Linux JLG 2.6.38-16-generic #67-Ubuntu SMP Thu Sep 6 17:58:38 UTC 2012 >>> x86_64 x86_64 x86_64 GNU/Linux >>> >>> ~$ mdadm --version >>> mdadm - v3.2.6 - 25th October 2012 >>> >>> ~$ modinfo raid456 >>> filename: /lib/modules/2.6.38-16-generic/kernel/drivers/md/raid456.ko >>> alias: raid6 >>> alias: raid5 >>> alias: md-level-6 >>> alias: md-raid6 >>> alias: md-personality-8 >>> alias: md-level-4 >>> alias: md-level-5 >>> alias: md-raid4 >>> alias: md-raid5 >>> alias: md-personality-4 >>> description: RAID4/5/6 (striping with parity) personality for MD >>> license: GPL >>> srcversion: 2A567A4740BF3F0C5D13267 >>> depends: async_raid6_recov,async_pq,async_tx,async_memcpy,async_xor >>> vermagic: 2.6.38-16-generic SMP mod_unload modversions >>> >>> The raid set when it's happy: >>> >>> mdadm-3.2.6$ sudo mdadm -D /dev/md1 >>> /dev/md1: >>> Version : 1.2 >>> Creation Time : Thu Jan 17 19:34:51 2013 >>> Raid Level : raid6 >>> Array Size : 1503744 (1468.75 MiB 1539.83 MB) >>> Used Dev Size : 250624 (244.79 MiB 256.64 MB) >>> Raid Devices : 8 >>> Total Devices : 8 >>> Persistence : Superblock is persistent >>> >>> Update Time : Thu Jan 17 19:35:02 2013 >>> State : active, resyncing >>> Active Devices : 8 >>> Working Devices : 8 >>> Failed Devices : 0 >>> Spare Devices : 0 >>> >>> Layout : left-symmetric >>> Chunk Size : 256K >>> >>> Resync Status : 13% complete >>> >>> Name : JLG:1 (local to host JLG) >>> UUID : 0100e727:8d91a5d9:67f0be9e:26be5623 >>> Events : 3 >>> >>> Number Major Minor RaidDevice State >>> 0 8 16 0 active sync /dev/sdb >>> 1 8 32 1 active sync /dev/sdc >>> 2 8 48 2 active sync /dev/sdd >>> 3 8 64 3 active sync /dev/sde >>> 4 8 80 4 active sync /dev/sdf >>> 5 8 96 5 active sync /dev/sdg >>> 6 8 112 6 active sync /dev/sdh >>> 7 8 192 7 active sync /dev/sdm >>> >>> >>> Thank you to anyone who's taking the time to look at this. >>> >>> Cheers, >>> >>> John Gehring -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html