On Friday December 5, jnelson-linux-raid@xxxxxxxxxxx wrote: > I set up a raid1 between some devices, and have been futzing with it. > I've been encountering all kinds of weird problems, including one > which required me to reboot my machine. > > This is long, sorry. > > First, this is how I built the raid: > > mdadm --create /dev/md10 --level=1 --raid-devices=2 --bitmap=internal > /dev/sdd1 --write-mostly --write-behind missing 'write-behind' is a setting on the bitmap and applies to all write-mostly devices, so it can be specified anywhere. 'write-mostly' is a setting that applies to a particular device, not to a position in the array. So setting 'write-mostly' on a 'missing' device has no useful effect. When you add a new device to the array you will need to set 'write-mostly' on that if you want that feature. i.e. mdadm /dev/md10 --add --write-mostly /dev/nbd0 > > then I added /dev/nbd0: > > mdadm /dev/md10 --add /dev/nbd0 > > and it rebuilt just fine. Good. > > Then I failed and removed /dev/sdd1, and added /dev/sda: > > mdadm /dev/md10 --fail /dev/sdd1 --remove /dev/sdd1 > mdadm /dev/md10 --add /dev/sda > > I let it rebuild. > > Then I failed, and removed it: > > The --fail worked, but the --remove did not. > > mdadm /dev/md10 --fail /dev/sda --remove /dev/sda > mdadm: set /dev/sda faulty in /dev/md10 > mdadm: hot remove failed for /dev/sda: Device or resource busy That is expected. Marking a device a 'failed' does not immediately disconnect it from the array. You have to wait for any in-flight IO requests to complete. > > Whaaa? > So I tried again: > > mdadm /dev/md10 --remove /dev/sda > mdadm: hot removed /dev/sda By now all those in-flight requests had completed and the device could be removed. > > OK. Better, but weird. > Since I'm using bitmaps, I would expect --re-add to allow the rebuild > to pick up where it left off. It was 78% done. Nope. With v0.90 metadata, a spare device is not marked a being part of the array until it is fully recovered. So if you interrupt a recovery there is no record how far it got. With v1.0 metadata we do record how far the recovery has progressed and it can restart. However I don't think that helps if you fail a device - only if you stop the array and later restart it. The bitmap is really about 'resync', not 'recovery'. > > ****** > Question 1: > I'm using a bitmap. Why does the rebuild start completely over? Because the bitmap isn't used to guide a rebuild, only a resync. The effect of --re-add is to make md do a resync rather than a rebuild if the device was previously a fully active member of the array. > > 4% into the rebuild, this is what --examine-bitmap looks like for both > components: > > Filename : /dev/sda > Magic : 6d746962 > Version : 4 > UUID : 542a0986:dd465da6:b224af07:ed28e4e5 > Events : 500 > Events Cleared : 496 > State : OK > Chunksize : 256 KB > Daemon : 5s flush period > Write Mode : Allow write behind, max 256 > Sync Size : 78123968 (74.50 GiB 80.00 GB) > Bitmap : 305172 bits (chunks), 305172 dirty (100.0%) > > turnip:~ # mdadm --examine-bitmap /dev/nbd0 > Filename : /dev/nbd0 > Magic : 6d746962 > Version : 4 > UUID : 542a0986:dd465da6:b224af07:ed28e4e5 > Events : 524 > Events Cleared : 496 > State : OK > Chunksize : 256 KB > Daemon : 5s flush period > Write Mode : Allow write behind, max 256 > Sync Size : 78123968 (74.50 GiB 80.00 GB) > Bitmap : 305172 bits (chunks), 0 dirty (0.0%) > > > No matter how long I wait, until it is rebuilt, the bitmap on /dev/sda > is always 100% dirty. > If I --fail, --remove (twice) /dev/sda, and I re-add /dev/sdd1, it > clearly uses the bitmap and re-syncs in under 1 second. Yes, there is a bug here. When an array recovers on to a hot space it doesn't copy the bitmap across. That will only happen lazily as bits are updated. I'm surprised I hadn't noticed that before, so they might be more to this than I'm seeing at the moment. But I definitely cannot find code to copy the bitmap across. I'll have to have a think about that. > > > *************** > Question 2: mdadm --detail and cat /proc/mdstat do not agree: > > NOTE: mdadm --detail says the rebuild status is 0% complete, but cat > /proc/mdstat shows it as 7%. > A bit later, I check again and they both agree - 14%. > Below, from when the rebuild was 7% according to /proc/mdstat I cannot explain this except to wonder if 7% of the recovery completed between running "mdadm -D" and "cat /proc/mdstat". The number report by "mdadm -D" is obtained by reading /proc/mdstat and applying "atoi()" to the string that ends with a '%'. NeilBrown > > /dev/md10: > Version : 00.90.03 > Creation Time : Fri Dec 5 07:44:41 2008 > Raid Level : raid1 > Array Size : 78123968 (74.50 GiB 80.00 GB) > Used Dev Size : 78123968 (74.50 GiB 80.00 GB) > Raid Devices : 2 > Total Devices : 2 > Preferred Minor : 10 > Persistence : Superblock is persistent > > Intent Bitmap : Internal > > Update Time : Fri Dec 5 20:04:30 2008 > State : active, degraded, recovering > Active Devices : 1 > Working Devices : 2 > Failed Devices : 0 > Spare Devices : 1 > > Rebuild Status : 0% complete > > UUID : 542a0986:dd465da6:b224af07:ed28e4e5 > Events : 0.544 > > Number Major Minor RaidDevice State > 2 8 0 0 spare rebuilding /dev/sda > 1 43 0 1 active sync /dev/nbd0 > > > md10 : active raid1 sda[2] nbd0[1] > 78123968 blocks [2/1] [_U] > [==>..................] recovery = 13.1% (10283392/78123968) > finish=27.3min speed=41367K/sec > bitmap: 0/150 pages [0KB], 256KB chunk > > > > -- > Jon > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html v -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html