Re: [PATCH mdadm] mdadm: Don't open md device for CREATE and ASSEMBLE

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2022-07-19 05:27, Mariusz Tkaczyk wrote:
> On Fri, 15 Jul 2022 10:20:26 +0800
> Guoqing Jiang <guoqing.jiang@xxxxxxxxx> wrote:
>> On 7/15/22 6:37 AM, Logan Gunthorpe wrote:
>>> To fix this, don't bother trying to open the md device for CREATE and
>>> ASSEMBLE commands, as the file descriptor will never be used anyway
>>> even if it is successfully openned.
> Hi All,
> 
> This is not a fix, it just reduces race probability because file descriptor
> will be opened later.

That's not correct. The later "open" call actually will use the new_array
parameter which will wait for the workqueue before creating a new array
device, so the old one is properly cleaned up and there is no longer
a race condition with this patch. If new_array doesn't exist and it falls back
to a regular open, then it will still do the right thing and open the device 
with create_on_open.

> I tried to resolve it in the past by adding completion to md driver and force
> mdadm --stop command to wait for sysfs clean up but I have never finished it.
> IMO it is a better way, wait for device to be fully removed by MD during stop.
> Thoughts?

I don't think that would work very well. Userspace would end up blocking
on --stop indefinitely if there are any references delaying cleanup to
the device. That's not very user friendly. Given that opening the md
device has side-effects, I think we should avoid opening when we
should know that we are about to try to create a new device.

> One objection I have here is that error handling is changed, so it could be
> harmful change for users. 

Hmm, yes seems like I was a bit sloppy here. However, it still seems possible
to fix this up by not pre-opening the device. The only use for the mdfd
in CREATE, ASSEMBLE and BUILD is to get the minor number if
ident.super_minor == -2 and check if an existing specified device is an md 
device (which may be redundant). We could replace this with a stat() call to
avoid opening the device. What about the patch at the end of this email?

>>>
>>> Side note: it would be nice to disable create_on_open as well to help
>>> solve this, but it seems the work for this was never finished. By default,
>>> mdadm will create using the old node interface when a name is specified
>>> unless the user specifically puts names=yes in a config file, which
>>> doesn't seem to be vcreate_on_openery common yet.  
>>
>> Hmm, 'create_on_open' is default to true, not sure if there is any side 
>> effect
>> after change to false.
> 
> I'm slowly working on this. The change is not simple, starting from terrible
> create_mddev code and char *mddev and char *name rules , ending with hidden
> references which we are not aware off (it is common to create temp node and
> open mddevice in mdadm) and other references in systemd (because of that, udev
> is avoiding to open device).
> This also indicate that we should drop partition discover in kernel and use
> udev in the future, especially for external metadata (where RO -> RW transition
> happens during assembly). But this is a separate problem.

I'm glad to hear someone is still working on that. Thanks!

Logan

--

diff --git a/mdadm.c b/mdadm.c
index 56722ed997a2..7b757c79d6bf 100644
--- a/mdadm.c
+++ b/mdadm.c
@@ -1349,6 +1349,8 @@ int main(int argc, char *argv[])
 
 	if (mode == MANAGE || mode == BUILD || mode == CREATE ||
 	    mode == GROW || (mode == ASSEMBLE && ! c.scan)) {
+		struct stat stb;
+
 		if (devs_found < 1) {
 			pr_err("an md device must be given in this mode\n");
 			exit(2);
@@ -1361,37 +1363,31 @@ int main(int argc, char *argv[])
 			mdfd = open_mddev(devlist->devname, 1);
 			if (mdfd < 0)
 				exit(1);
+
+			fstat(mdfd, &stb);
 		} else {
 			char *bname = basename(devlist->devname);
+			int ret;
 
 			if (strlen(bname) > MD_NAME_MAX) {
 				pr_err("Name %s is too long.\n", devlist->devname);
 				exit(1);
 			}
-			/* non-existent device is OK */
-			mdfd = open_mddev(devlist->devname, 0);
-		}
-		if (mdfd == -2) {
-			pr_err("device %s exists but is not an md array.\n", devlist->devname);
-			exit(1);
-		}
-		if ((int)ident.super_minor == -2) {
-			struct stat stb;
-			if (mdfd < 0) {
+
+			ret = stat(devlist->devname, &stb);
+			if (ident.super_minor == -2 && ret) {
 				pr_err("--super-minor=dev given, and listed device %s doesn't exist.\n",
					devlist->devname);
+				exit(1);
+			}
+
+			if (!ret && !md_array_valid_stat(&stb)) {
+				pr_err("device %s exists but is not an md array.\n", devlist->devname);
 				exit(1);
 			}
-			fstat(mdfd, &stb);
-			ident.super_minor = minor(stb.st_rdev);
-		}
-		if (mdfd >= 0 && mode != MANAGE && mode != GROW) {
-			/* We don't really want this open yet, we just might
-			 * have wanted to check some things
-			 */
-			close(mdfd);
-			mdfd = -1;
 		}
+		if (ident.super_minor == -2)
+			ident.super_minor = minor(stb.st_rdev);
 	}
 
 	if (s.raiddisks) {
diff --git a/mdadm.h b/mdadm.h
index 05ef881f4709..7462336b381c 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -1504,6 +1504,7 @@ extern int Restore_metadata(char *dev, char *dir, struct context *c,
 			    struct supertype *st, int only);
 
 int md_array_valid(int fd);
+int md_array_valid_stat(struct stat *st);
 int md_array_active(int fd);
 int md_array_is_active(struct mdinfo *info);
 int md_get_array_info(int fd, struct mdu_array_info_s *array);
diff --git a/util.c b/util.c
index cc94f96ef120..20acdcf6af22 100644
--- a/util.c
+++ b/util.c
@@ -250,6 +250,23 @@ int md_array_valid(int fd)
 	return !ret;
 }
 
+int md_array_valid_stat(struct stat *st)
+{
+	struct mdinfo *sra;
+	char *devnm;
+
+	devnm = stat2devnm(st);
+	if (!devnm)
+		return 0;
+
+	sra = sysfs_read(-1, devnm, GET_ARRAY_STATE);
+	if (!sra)
+		return 0;
+
+	free(sra);
+	return 1;
+}
+
 int md_array_active(int fd)
 {
 	struct mdinfo *sra;



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux