On Mon, 29 Sep 2014 10:45:17 +0200 Francis Moreau <francis.moro@xxxxxxxxx> wrote: > > So what were pids 930 and 459? > > One was presumably the "mdadm -Ss" - probably 930. > > Is 459 the "mdadm --monitor" ?? That might be useful hint. > > > > yes. > > [456] is: /sbin/mdadm --monitor --scan --daemonise --syslog > --pid-file=/run/mdadm/mdadm.pid > > and [930] is 'mdamd -Ss'. Good. Please try the patch below. > > > >> > >> > >>> Probably there is a 'change' event happening just before the 'remove' event, > >>> and udev runs "mdadm" on the 'change' event, and that ends up happening after > >>> the device has been removed. > >>> > >>> Is this really a problem? Can't you just ignore it and pretend it isn't > >>> there? > >> > >> Well, if you list the block devices that the kernel detected in order to > >> operate on them, it could. I don't know exactly what would be the result > >> to use it but it could confuse some tools. > >> > >> Is there a way to check that the 'ghost' device has been removed by > >> poking sysfs ? > > > > If you look at /sys/block/md*/md/array_state, those that contain 'inactive' > > or 'clear' might be 'ghosts', or might be in the process of being assembled. > > If you write 'clear' to the same file they should disappear.... unless udev > > does something to re-create them. > > > > It's in 'clear' state, and writing 'clear' doesn't make the device disapear. > > [root@localhost ~]# dmesg -c >/dev/null > [root@localhost ~]# echo clear >/sys/block/md125/md/array_state > [root@localhost ~]# dmesg > [ 254.106252] md: md125 stopped. > [ 254.108182] md_open(): mdX opened by mdadm [968] > > [ 254.109103] md_open(): md125 opened by mdadm [459] > [ 254.109127] md_open(): md125 opened by mdadm [459] > [ 254.109281] md_release(): md125 released by mdadm [459] > > [ 254.109337] md_open(): md125 opened by mdadm [968] > [ 254.109572] md_release(): md125 released by mdadm [968] > > [ 254.109847] md_open(): md125 opened by systemd-udevd [967] > [ 254.109986] md_release(): md125 released by systemd-udevd [967] > > In that sequence, it seems that mdadm [459] is missing a md_release() > here. Is this expected ? Presumably the first md_open returned an error. You could add another printk at each 'return' to check. Thanks, NeilBrown diff --git a/Monitor.c b/Monitor.c index 5cb24fab8f2a..971d2ecbea72 100644 --- a/Monitor.c +++ b/Monitor.c @@ -460,7 +460,7 @@ static int check_array(struct state *st, struct mdstat_ent *mdstat, mdu_array_info_t array; struct mdstat_ent *mse = NULL, *mse2; char *dev = st->devname; - int fd; + int fd = -1; int i; int remaining_disks; int last_disk; @@ -468,6 +468,27 @@ static int check_array(struct state *st, struct mdstat_ent *mdstat, if (test) alert("TestMessage", dev, NULL, ainfo); + if (st->devnm[0]) + fd = open("/sys/block", O_RDONLY|O_DIRECTORY); + if (fd >= 0) { + /* Don't open the device unless it is present and + * active in sysfs. + */ + char buf[10]; + close(fd); + fd = sysfs_open(st->devnm, NULL, "array_state"); + if (fd < 0 || + read(fd, buf, 10) < 5 || + strncmp(buf,"clear",5) == 0 || + strncmp(buf,"inact",5) == 0) { + if (fd >= 0) + close(fd); + if (!st->err) + alert("DeviceDisappeared", dev, NULL, ainfo); + st->err++; + return 0; + } + } fd = open(dev, O_RDONLY); if (fd < 0) { if (!st->err)
Attachment:
pgp729lm3yprW.pgp
Description: OpenPGP digital signature