Re: /sys/block/md126 still exists even after stopping the array

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 29 Sep 2014 10:45:17 +0200 Francis Moreau <francis.moro@xxxxxxxxx>
wrote:

> > So what were pids 930 and 459?
> > One was presumably the "mdadm -Ss"  - probably 930.
> > Is 459 the "mdadm --monitor" ??  That might be useful hint.
> > 
> 
> yes.
> 
> [456] is:  /sbin/mdadm --monitor --scan --daemonise --syslog
> --pid-file=/run/mdadm/mdadm.pid
> 
> and [930] is 'mdamd -Ss'.

Good.  Please try the patch below.

> > 
> >>
> >>
> >>> Probably there is a 'change' event happening just before the 'remove' event,
> >>> and udev runs "mdadm" on the 'change' event, and that ends up happening after
> >>> the device has been removed.
> >>>
> >>> Is this really a problem?  Can't you just ignore it and pretend it isn't
> >>> there?
> >>
> >> Well, if you list the block devices that the kernel detected in order to
> >> operate on them, it could. I don't know exactly what would be the result
> >> to use it but it could confuse some tools.
> >>
> >> Is there a way to check that the 'ghost' device has been removed by
> >> poking sysfs ?
> > 
> > If you look at /sys/block/md*/md/array_state, those that contain 'inactive'
> > or 'clear' might be 'ghosts', or might be in the process of being assembled.
> > If you write 'clear' to the same file they should disappear.... unless udev
> > does something to re-create them.
> > 
> 
> It's in 'clear' state, and writing 'clear' doesn't make the device disapear.
> 
> [root@localhost ~]# dmesg -c >/dev/null
> [root@localhost ~]# echo clear >/sys/block/md125/md/array_state
> [root@localhost ~]# dmesg
> [  254.106252] md: md125 stopped.
> [  254.108182] md_open(): mdX opened by mdadm [968]
> 
> [  254.109103] md_open(): md125 opened by mdadm [459]
>   [  254.109127] md_open(): md125 opened by mdadm [459]
>   [  254.109281] md_release(): md125 released by mdadm [459]
> 
>   [  254.109337] md_open(): md125 opened by mdadm [968]
>   [  254.109572] md_release(): md125 released by mdadm [968]
> 
>   [  254.109847] md_open(): md125 opened by systemd-udevd [967]
>   [  254.109986] md_release(): md125 released by systemd-udevd [967]
> 
> In that sequence, it seems that mdadm [459] is missing a md_release()
> here. Is this expected ?

Presumably the first md_open returned an error.  You could add another printk
at each 'return' to check.

Thanks,
NeilBrown


diff --git a/Monitor.c b/Monitor.c
index 5cb24fab8f2a..971d2ecbea72 100644
--- a/Monitor.c
+++ b/Monitor.c
@@ -460,7 +460,7 @@ static int check_array(struct state *st, struct mdstat_ent *mdstat,
 	mdu_array_info_t array;
 	struct mdstat_ent *mse = NULL, *mse2;
 	char *dev = st->devname;
-	int fd;
+	int fd = -1;
 	int i;
 	int remaining_disks;
 	int last_disk;
@@ -468,6 +468,27 @@ static int check_array(struct state *st, struct mdstat_ent *mdstat,
 
 	if (test)
 		alert("TestMessage", dev, NULL, ainfo);
+	if (st->devnm[0])
+		fd = open("/sys/block", O_RDONLY|O_DIRECTORY);
+	if (fd >= 0) {
+		/* Don't open the device unless it is present and
+		 * active in sysfs.
+		 */
+		char buf[10];
+		close(fd);
+		fd = sysfs_open(st->devnm, NULL, "array_state");
+		if (fd < 0 ||
+		    read(fd, buf, 10) < 5 ||
+		    strncmp(buf,"clear",5) == 0 ||
+		    strncmp(buf,"inact",5) == 0) {
+			if (fd >= 0)
+				close(fd);
+			if (!st->err)
+				alert("DeviceDisappeared", dev, NULL, ainfo);
+			st->err++;
+			return 0;
+		}
+	}
 	fd = open(dev, O_RDONLY);
 	if (fd < 0) {
 		if (!st->err)

Attachment: pgp729lm3yprW.pgp
Description: OpenPGP digital signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux