On 04/08/2010 08:33 PM, Neil Brown wrote: > On Fri, 9 Apr 2010 09:31:53 +1000 > Neil Brown <neilb@xxxxxxx> wrote: > >> On Tue, 06 Apr 2010 22:02:58 -0400 >> Doug Ledford <dledford@xxxxxxxxxx> wrote: >> >>> Let me know if you want me to redo/resend any of it. >>> >> >> not necessary, but some remove would be appreciated. >> > I suspect you cope graciously with most of my typos, but this one is > ridiculous! > > s/remove/review/ > > :-( > > NeilBrown > > P.S. When could the word "remove" be used in that context? The only excuse > I can think of is that a "remove" is a palette cleanser at a banquet. > So "some remove" would be "some crystallized fruit or ginger bread" - always > appreciated :-) I've already sent two minor patches, the touchup patch from my original submission (which hasn't hit your git repo yet but needs to) and the one to resolve the segfault in your initial patch. I also have these two additional patches that apply on top of your hotunplug branch. With these applied, the code is back to the same basic functionality as the patch I posted originally. However, don't take that to mean it's done ;-) Of note, there are still a couple lingering issues: 1) The kernel won't let us set a device faulty if the array is a non-redundant array type. This is true of mdadm -If $name and also mdadm <mddevice> -f <device>. We need to fix this. I suspect the reason that the kernel doesn't set non-redundant array types as failed in this situation is on the hopes that the underlying device will come back. If it does, and we haven't failed the device yet, then theoretically we could pick up where we left off instead of ever having to fail at all, where as if we fail at all, the entire array goes down. As it turns out, this is a pipe dream. The reason I say that this is a pipe dream is that our continued holding of a reference to the device simply guarantees that even if it came back, it's going to come back with a different device name and major/minor pair, so we will *never* be able to pick up where we left off. So once that device goes away, our array is toast whether we like it or not. 2) IMSM containers are still a bit flaky on their incremental remove/add support. I've been speaking with Dan about this and we've agreed that the fact that they incrementally assemble differently than regular md raid arrays is problematic, especially for udev rules files, and so he was going to work some on the issue (I also have a patch that gets things minimally functional here, but I'm not attaching that to this email). My IMSM rules in the udev file are dracut specific. I don't know which distros will end up switch to dracut from mkinitrd, but we already have for a couple releases of Fedora now. They could be modified for other boot initramfs builders as well, but they basically default imsm array handling to mdadm in the absence of the rd_NO_MDIMSM command line option. -- Doug Ledford <dledford@xxxxxxxxxx> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband
From 506c331a6030531fe647c7a654bd87ae32373543 Mon Sep 17 00:00:00 2001 From: Doug Ledford <dledford@xxxxxxxxxx> Date: Tue, 13 Apr 2010 14:38:44 -0400 Subject: [PATCH 3/4] udev rules: imsm arrays are a different ID_FS_TYPE, so check for them as well as regular mdadm arrays. Check to see if we were told on the boot command line to ignore IMSM arrays first though (aka, rd_NO_MDIMSM is not empty). Signed-off-by: Doug Ledford <dledford@xxxxxxxxxx> --- udev-md-raid.rules | 4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/udev-md-raid.rules b/udev-md-raid.rules index da52058..712c5e5 100644 --- a/udev-md-raid.rules +++ b/udev-md-raid.rules @@ -4,7 +4,9 @@ SUBSYSTEM!="block", GOTO="md_end" # handle potential components of arrays ENV{ID_FS_TYPE}=="linux_raid_member", ACTION=="remove", RUN+="/sbin/mdadm -If $name" -ENV{ID_FS_TYPE}=="linux_raid_member", ACTION=="add", RUN+="/sbin/mdadm --incremental $env{DEVNAME}" +ENV{ID_FS_TYPE}=="linux_raid_member", ACTION=="add", RUN+="/sbin/mdadm -I $tempnode" +ENV{ID_FS_TYPE}=="isw_raid_member", ENV{rd_NO_MDIMSM}!="?*", ACTION=="remove", RUN+="/sbin/mdadm -If $name" +ENV{ID_FS_TYPE}=="isw_raid_member", ENV{rd_NO_MDIMSM}!="?*", ACTION=="add", RUN+="/sbin/mdadm -I $tempnode" # handle md arrays ACTION!="add|change", GOTO="md_end" -- 1.6.6.1
From d1f7ba12a4e1ec9ed3a005a84489073d80be8cc9 Mon Sep 17 00:00:00 2001 From: Doug Ledford <dledford@xxxxxxxxxx> Date: Tue, 13 Apr 2010 14:44:24 -0400 Subject: [PATCH 4/4] The assumption is that if tfd >= 0, then we have a valid major/minor pair in stb.st_rdev to use in ioctl calls. Therefore, even if we use the sysfs block/dev entry to get our major/minor pair, we should really be using tfd and not sysfd as it changes assumptions made later on. Note: I actually don't like this. I would prefer that in the case of a kernel internal name that we skipped tfd altogether and went strait to sysfs operation exclusively. But that's just my preference. I'm concerned that sometime in the future we'll make the mistake of assuming that tfd >= 0 means dv->devname is a fully qualified path name and not a kernel internal name. We don't now, but I could easily see it happening. Signed-off-by: Doug Ledford <dledford@xxxxxxxxxx> --- Manage.c | 12 ++++++------ 1 files changed, 6 insertions(+), 6 deletions(-) diff --git a/Manage.c b/Manage.c index 043e3f6..9e450b9 100644 --- a/Manage.c +++ b/Manage.c @@ -386,7 +386,7 @@ int Manage_subdevs(char *devname, int fd, next = dv->next; jnext = 0; - tfd = -1; + tfd = sysfd = -1; if (strcmp(dv->devname, "failed")==0 || strcmp(dv->devname, "faulty")==0) { @@ -461,17 +461,16 @@ int Manage_subdevs(char *devname, int fd, } sprintf(dname, "dev-%s", dv->devname); - sysfd = sysfs_open(fd2devnum(fd), dname, "block/dev"); - if (sysfd >= 0) { + tfd = sysfs_open(fd2devnum(fd), dname, "block/dev"); + if (tfd >= 0) { char dn[20]; int mj,mn; - if (sysfs_fd_get_str(sysfd, dn, 20) > 0 && + if (sysfs_fd_get_str(tfd, dn, 20) > 0 && sscanf(dn, "%d:%d", &mj,&mn) == 2) { stb.st_rdev = makedev(mj,mn); found = 1; } - close(sysfd); - sysfd = -1; + close(tfd); } if (!found) { sysfd = sysfs_open(fd2devnum(fd), dname, "state"); @@ -481,6 +480,7 @@ int Manage_subdevs(char *devname, int fd, dv->devname, devname); return 1; } + tfd = -1; } } else { j = 0; -- 1.6.6.1
Attachment:
signature.asc
Description: OpenPGP digital signature