Re: [mdadm PATCH] Create: move STOP_ARRAY to abort_locked

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 05/09/2017 01:54 AM, Jes Sorensen wrote:
On 05/07/2017 09:50 PM, Zhilong Liu wrote:


On 05/05/2017 11:31 AM, Liu Zhilong wrote:


On 05/04/2017 10:54 PM, Jes Sorensen wrote:
On 05/04/2017 08:20 AM, Zhilong Liu wrote:
Hi Jes,

apply for review, this is a bug I ever encountered.

Zhilong,

Under what circumstances do you see this?


Issued the command:
linux-g0sr:/home/test # ./mdadm -CR /dev/md0 -l1 -n2 -b internal
/dev/loop[0-1] --size 63
... ... ...
mdadm: Given bitmap chunk size not supported.
linux-g0sr:/home/test # ls /dev/md0
/dev/md0
linux-g0sr:/home/test # ls /sys/block/md0/md/
array_size   bitmap      component_size  level metadata_version
raid_disks         reshape_position safe_mode_delay
array_state  chunk_size  layout          max_read_errors
new_dev           reshape_direction  resync_start

create_mddev() writes the devnm to
/sys/module/md_mod/parameter/new_array,
then in md.c, module_param_call() called the 'set' function of
add_named_array(),
md_alloc() init_and_add the kobject for devm, finally the devnm device
has created
and sysfs has registered after create_mddev executed successfully.
Thus it's better
to STOP_ARRAY in any case after create_mddev() invoked.


this patch depends on the kernel commit:
039b7225e6e9 ("md: allow creation of mdNNN arrays via
md_mod/parameters/new_array")
Neil's patch has set "mddev->hold_active = UNTIL_STOP", thus the
STOP_ARRAY can work
well on this situation.

OK now I am confused - are you saying this change will only work after Neil's kernel patch has been applied? That would be no good, mdadm needs to work on older kernels too.


Sorry for the late reply for this.

Currently, the creating array method for later kernel(newer than v4.11), it can avoid the race problem via to set 'create_on_open' as N and write the mddev -> /sys/module/md_mod/parameters/new_array, then mdadm can send the ioctl to stop_array directly if creating array fail, it won't happen the race. refer to commit: 78b6350dcaad ("md: support disabling of create-on-open semantics.")

And for older kernel, it's difficult to avoid the problem of the race when use 'change', 'add' and 'remove'
udev rules in a short period of time via to ioctl commands.
For example, this case tested on my latest Tumbleweed(20170524) with latest mdadm source.

Steps:
First part:
-  open one terminal to monitor the udev:
Terminal 1: # udevadm monitor
monitor will print the received events for:
UDEV - the event which udev sends out after rule processing
KERNEL - the kernel uevent

-  type the command in another terminal:
Terminal 2: # ./mdadm -CR /dev/md1 --bitmap=internal -l1 -n2 /dev/loop1 /dev/loop2 --size 63

add_internal_bitmap received the abnormal chunk_size( < 64k), and return a failure. but it left the partially created array and didn't cleanup. For this test, the udev monitor prints:

Terminal 1: # udevadm monitor
... ...
KERNEL[146.077168] add      /devices/virtual/bdi/9:1 (bdi)
KERNEL[146.077211] add      /devices/virtual/block/md1 (block)
UDEV  [146.078112] add      /devices/virtual/bdi/9:1 (bdi)
UDEV  [146.084163] add      /devices/virtual/block/md1 (block)

Terminal 2: # ./mdadm -S /dev/md1

Terminal 1: # udevadm monitor
KERNEL[153.276209] remove   /devices/virtual/bdi/9:1 (bdi)
KERNEL[153.276317] remove   /devices/virtual/block/md1 (block)
UDEV  [153.277034] remove   /devices/virtual/bdi/9:1 (bdi)
UDEV  [153.280801] remove   /devices/virtual/block/md1 (block)

Second part:
add the ioctl(stop_array) and compiled to monitor the udevs.
# git diff
diff --git a/Create.c b/Create.c
index 239545f..21568ca 100644
--- a/Create.c
+++ b/Create.c
@@ -1065,7 +1065,9 @@ int Create(struct supertype *st, char *mddev,
        map_remove(&map, fd2devnm(mdfd));
        map_unlock(&map);

-       if (mdfd >= 0)
+       if (mdfd >= 0) {
+               ioctl(mdfd, STOP_ARRAY, NULL);
                close(mdfd);
+       }
        return 1;
 }

Terminal 2: # ./mdadm -CR /dev/md1 --bitmap=internal -l1 -n2 /dev/loop1 /dev/loop2 --size 63

Terminal 1: # udevadm monitor
... ...
KERNEL[171.964597] add      /devices/virtual/bdi/9:1 (bdi)
KERNEL[171.965346] add      /devices/virtual/block/md1 (block)
UDEV  [171.965565] add      /devices/virtual/bdi/9:1 (bdi)
KERNEL[171.984195] remove   /devices/virtual/bdi/9:1 (bdi)
KERNEL[171.984555] remove   /devices/virtual/block/md1 (block)
UDEV  [171.984590] remove   /devices/virtual/bdi/9:1 (bdi)
KERNEL[172.004504] add      /devices/virtual/bdi/9:1 (bdi)
KERNEL[172.004890] add      /devices/virtual/block/md1 (block)
UDEV  [172.004923] add      /devices/virtual/bdi/9:1 (bdi)
UDEV  [172.005787] add      /devices/virtual/block/md1 (block)
UDEV  [172.009648] remove   /devices/virtual/block/md1 (block)
UDEV  [172.013232] add      /devices/virtual/block/md1 (block)

So shall give udev a moment to process 'add' events?

mdadm can work as expect when adding usleep(100 * 1000) before perform ioctl(STOP_ARRAY).
this is the udev monitor result tested by the following sample.

Terminal 1: # udevadm monitor
KERNEL[5476.780692] add      /devices/virtual/bdi/9:1 (bdi)
KERNEL[5476.780976] add      /devices/virtual/block/md1 (block)
UDEV  [5476.782355] add      /devices/virtual/bdi/9:1 (bdi)
UDEV  [5476.786056] add      /devices/virtual/block/md1 (block)
KERNEL[5476.896255] remove   /devices/virtual/bdi/9:1 (bdi)
KERNEL[5476.896367] remove   /devices/virtual/block/md1 (block)
UDEV  [5476.896895] remove   /devices/virtual/bdi/9:1 (bdi)
UDEV  [5476.900752] remove   /devices/virtual/block/md1 (block)


Such as like this:

diff --git a/Create.c b/Create.c
index 239545f..a07ace8 100644
--- a/Create.c
+++ b/Create.c
@@ -902,10 +902,8 @@ int Create(struct supertype *st, char *mddev,
                                        remove_partitions(fd);
                                if (st->ss->add_to_super(st, &inf->disk,
                                                         fd, dv->devname,
- dv->data_offset)) {
-                                       ioctl(mdfd, STOP_ARRAY, NULL);
+ dv->data_offset))
                                        goto abort_locked;
-                               }
                                st->ss->getinfo_super(st, inf, NULL);
                                safe_mode_delay = inf->safe_mode_delay;

@@ -1006,7 +1004,6 @@ int Create(struct supertype *st, char *mddev,
                        sysfs_set_safemode(&info, safe_mode_delay);
                        if (err) {
                                pr_err("failed to activate array.\n");
-                               ioctl(mdfd, STOP_ARRAY, NULL);
                                goto abort;
                        }
                } else if (c->readonly &&
@@ -1016,7 +1013,6 @@ int Create(struct supertype *st, char *mddev,
                                          "array_state", "readonly") < 0) {
                                pr_err("Failed to start array: %s\n",
                                       strerror(errno));
-                               ioctl(mdfd, STOP_ARRAY, NULL);
                                goto abort;
                        }
                } else {
@@ -1028,7 +1024,6 @@ int Create(struct supertype *st, char *mddev,
if (info.array.chunk_size & (info.array.chunk_size-1)) { cont_err("Problem may be that chunk size is not a power of 2\n");
                                }
-                               ioctl(mdfd, STOP_ARRAY, NULL);
                                goto abort;
                        }
                        /* if start_ro module parameter is set, array is
@@ -1065,7 +1060,11 @@ int Create(struct supertype *st, char *mddev,
        map_remove(&map, fd2devnm(mdfd));
        map_unlock(&map);

-       if (mdfd >= 0)
+       if (mdfd >= 0) {
+               /* Give udev a moment to finish 'add' events. */
+               usleep(100*1000);
+               ioctl(mdfd, STOP_ARRAY, NULL);
                close(mdfd);
+       }
        return 1;
 }


Thanks,
-Zhilong

Jes




--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux