Neil, After I fail an active drive in a raid5 volume, and then cleanly shut down the volume, it cannot be started again without a —force. Reproduce: 1) create volume: mdadm -C /dev/md0 -l 5 -c 16 -n 3 /dev/sd[abc] 2) write data to /dev/md0 while failing a drive: mdadm -f /dev/md0 /dev/sda 3) md0 is not in degraded mode, sda has failed, sda superblock is “active”. 4) cleanly stop volume: mdadm -S /dev/md0 5) attempt to start volume: mdadm -v -A /dev/md0 /dev/sd[abc] mdadm: looking for devices for /dev/md0 mdadm: /dev/fioa is identified as a member of /dev/md0, slot 0. mdadm: /dev/fiob is identified as a member of /dev/md0, slot 1. mdadm: /dev/fioc is identified as a member of /dev/md0, slot 2. mdadm: added /dev/fioa to /dev/md0 as 0 (possibly out of date) mdadm: added /dev/fioc to /dev/md0 as 2 mdadm: added /dev/fiob to /dev/md0 as 1 mdadm: /dev/md0 assembled from 2 drives - not enough to start the array while not clean - consider —force. mdadm never attempts to RUN_ARRAY. Since the failed drive was in “active” state when it was failed, mdadm assumes that the entire volume is not clean when it attempts to assemble. I looks like mdadm does not count the failed drive as “available” while assembling, but at the same time allows it to prevent the volume from being “clean”. The logic in util.c:enough() thinks the volume does not have enough drive to be started: /* clean=0, avail_disks=2, raid_disks=3 */ case 5: if (clean) return avail_disks >= raid_disks-1; else return avail_disks >= raid_disks; Is this a bug? This happens with both mdadm-3.2.6 and latest version from github. I have attached output from “mdadm -E” for each drive. -eivind
Attachment:
mdadm.out
Description: Binary data