On Tue, 31 Jul 2012 10:46:26 +0200 Albert Pauw <albert.pauw@xxxxxxxxx> wrote: > On 07/31/2012 08:11 AM, NeilBrown wrote: > > On Sat, 28 Jul 2012 13:46:06 +0200 Albert Pauw <albert.pauw@xxxxxxxxx> wrote: > > > >> Hi Neil, > >> > >> After a hiatus of 1.5 year (busy with all sorts) I am back and tried the > >> ddf code to see how things improved. > > Thanks! > > > >> I build a VM Centos 6.3 system with 6 extra 1GB disks for testing. > >> I found several issues in the standard installed 3.2.3 version of mdadm > >> relating to ddf, but installed the > >> 3.2.5 version in order to work with recent code. > >> > >> However, while version 3.2.3 is able to create a ddf container with > >> raidsets in it, I found a problem with the 3.2.5 version. > >> > >> After initially creating the container: > >> > >> mdadm -C /dev/md127 -e ddf -l container /dev/sd[b-g] > >> > >> which worked, I created a raid (1 or 5 it doesn't matter in this case) > >> in it: > >> > >> mdadm -C /dev/md0 -l raid5 -n 3 /dev/md127 > >> > >> However, it stays on resync=PENDING and readonly, and doesn't get build. > >> > >> So I tried to set it to readwrite: > >> > >> mdadm --readwrite /dev/md0 > >> > >> Unfortunately, it stays on readonly and doesn't get build. > >> > >> As said before, this did work in 3.2.3. > >> > >> Are you already on this problem? > > It sounds like a problem with 'mdmon'. mdmon needs to be running before the > > array can become read-write. mdadm should start mdmon automatically but > > maybe it isn't. Maybe it cannot find mdmon? > > > > could you check if mdadm is running? If it isn't run > > mdmon /dev/md127 & > > and see if it starts working. > Hi Neil, > > thanks for your reply. Yes, mdmon wasn't running. Couldn't get it > running with a recompiled 3.2.5, the standard one which came with Centos > (3.2.3) works fine, I assume the made some changes to the code? Anyway, > I moved to my own laptop, running Fedora 16 and pulled mdadm frm git and > recompiled. That works. I also used loop devices as disks. > > Here is the first of my findings: > > I created a container with six disks, disk 1-2 is a raid 1 device, disk > 3-6 are a raid 6 device. > > Here is the table shown at the end of the mdadm -E command for the > container: > > Physical Disks : 6 > Number RefNo Size Device Type/State > 0 06a5f547 479232K /dev/loop2 active/Online > 1 47564acc 479232K /dev/loop3 active/Online > 2 bf30692c 479232K /dev/loop5 active/Online > 3 275d02f5 479232K /dev/loop4 active/Online > 4 b0916b3f 479232K /dev/loop6 active/Online > 5 65956a72 479232K /dev/loop1 active/Online > > I now fail a disk (disk 0) and I get: > > Physical Disks : 6 > Number RefNo Size Device Type/State > 0 06a5f547 479232K /dev/loop2 active/Online > 1 47564acc 479232K /dev/loop3 active/Online > 2 bf30692c 479232K /dev/loop5 active/Online > 3 275d02f5 479232K /dev/loop4 active/Online > 4 b0916b3f 479232K /dev/loop6 active/Online > 5 65956a72 479232K /dev/loop1 active/Offline, Failed > > Then I removed the disk from the container: > > Physical Disks : 6 > Number RefNo Size Device Type/State > 0 06a5f547 479232K /dev/loop2 active/Online > 1 47564acc 479232K /dev/loop3 active/Online > 2 bf30692c 479232K /dev/loop5 active/Online > 3 275d02f5 479232K /dev/loop4 active/Online > 4 b0916b3f 479232K /dev/loop6 active/Online > 5 65956a72 479232K active/Offline, > Failed, Missing > > Notice the active/Offline status, is this correct? To be honest, I don't know. The DDF spec doesn't really go into that sort of detail, or at least I didn't find it. Given that the device is Missing, it hardly seems to matter whether it is Active or Spare or Foreign or Legacy. I guess if it re-appears we want to know what it was ... maybe. > > I added the disk back into the container, NO zero-superblock: > > Physical Disks : 6 > Number RefNo Size Device Type/State > 0 06a5f547 479232K /dev/loop2 active/Online > 1 47564acc 479232K /dev/loop3 active/Online > 2 bf30692c 479232K /dev/loop5 active/Online > 3 275d02f5 479232K /dev/loop4 active/Online > 4 b0916b3f 479232K /dev/loop6 active/Online > 5 65956a72 479232K /dev/loop1 active/Offline, > Failed, Missing > > It stays active/Offline (this is now correct I assume), Failed (again > correct if had failed before), but also still missing. I found why this happens. When I added code to support incremental assembly of DDF arrays, I broke the ability to hot-add a device which happened to have reasonably good looking metadata on it. The best approach for now is to --zero the device first. I'll push out a patch which does just that. > > I remove the disk again, do a zero-superblock and add it again: > > Physical Disks : 6 > Number RefNo Size Device Type/State > 0 06a5f547 479232K /dev/loop2 active/Online > 1 47564acc 479232K /dev/loop3 active/Online > 2 bf30692c 479232K /dev/loop5 active/Online > 3 275d02f5 479232K /dev/loop4 active/Online > 4 b0916b3f 479232K /dev/loop6 active/Online > 5 ede51ba3 479232K /dev/loop1 active/Online, Rebuilding > > This is correct, the disk is seen as a new disk and rebuilding starts. > > > Regards, > > Albert diff --git a/Manage.c b/Manage.c index f83af65..7f27f74 100644 --- a/Manage.c +++ b/Manage.c @@ -786,6 +786,7 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv, return -1; } + Kill(dv->devname, NULL, 0, -1, 0); dfd = dev_open(dv->devname, O_RDWR | O_EXCL|O_DIRECT); if (mdmon_running(tst->container_dev)) tst->update_tail = &tst->updates; Thanks, NeilBrown
Attachment:
signature.asc
Description: PGP signature