On Fri, 18 Apr 2014 09:24:35 +0000 "Baldysiak, Pawel" <pawel.baldysiak@xxxxxxxxx> wrote: > Hi Neil/All. > We have discovered some problems with IMSM array reshape under OSs managed by systemd. > In case of reshape of arrays with IMSM metadata, mdadm manages the whole reshape process and it needs to be running in background. > If we reboot while reshaping, array will be assembled at startup > by udevworker by IMPORT{program}="mdadm -I /dev/sdX --export --offroot" part of udev rule array. > Mdadm will fork and continue to reshape an array from checkpoint. > However, systemd will treat udevworker as hanged process and it will be killed due to timeout with all its children (reshape will hang then). > I had planned to propose a patch for this problem, where additional unit file will be added > and udev will start systemd-service for mdadm -I command (see below), > but then we will lose information about exported variables - the ones that are used to trigger mdadm-last-resort service. > > Do You have any idea how to solve this problem, and keep both functionalities? Hi, thanks for raising this issue. I think we need to address this using "mdadm --grow --continue". e.g. in used we run "mdadm -I --freeze-reshape --export" and arrange for that to report some setting if a reshape is needed. If it is needed, we set SYSTEMD_WANTS to some service which will run "mdadm --grow --continue $device". Possibly we could get mdadm to run "systemctl start mdadm-reshape@$dev" instead of forking, like it now does for running mdmon. I might have a poke at the code and see what falls out. NeilBrown > > Pawel Baldysiak > > -------------------------------------------------------------------------------------------------------------- > My patch ("IMPORT{program}" behaves same as "RUN", but exports output as variables): > > >From 8549f0ffcd72589cedf24d07b496af2ce16d14ec Mon Sep 17 00:00:00 2001 > From: Pawel Baldysiak <pawel.baldysiak@xxxxxxxxx> > Date: Thu, 10 Apr 2014 15:16:02 +0200 > Subject: [PATCH] Use unit file for incremental assemblation from udev. > > Incremental assemblation of an array at OS boot is started by RUN > command triggered by udev, so far. RUN command is used for starting > short-time processes that will complete quickly. Some operations, like > reshape of IMSM arrays, are managed by mdadm. In OSs managed by systemd - > udev worker that triggered "mdadm -I" will be terminated by SIGKILL due > to timeout. This also kills mdadm process, so reshape will stop. > > This patch adds new unit file, that will be started in OSs managed by > systemd instead of "RUN=" command. Udev rule will only start the new > service and finish its work. Unit file will start "mdadm -I" for disk > passed as an argument from rule. > > In scenario where we reshape IMSM array, general migration record is > written only on two first disks of an array, so if we reboot OS and udev > starts adding disks from e.g. the last one, "mdadm -I" will end with > exit code "4" due to inaccessible general migration record. This should > also be considered as success exit status, because disk is successfully > assembled according to its metadata. Otherwise system will log > information about service failure. > > Signed-off-by: Pawel Baldysiak <pawel.baldysiak@xxxxxxxxx> > Reviewed-by: Artur Paszkiewicz <artur.paszkiewicz@xxxxxxxxx> > --- > Makefile | 1 + > systemd/mdadm-inc@.service | 10 ++++++++++ > udev-md-raid-assembly.rules | 4 +++- > 3 files changed, 14 insertions(+), 1 deletion(-) > create mode 100644 systemd/mdadm-inc@.service > > diff --git a/Makefile b/Makefile > index b823d85..b199efd 100644 > --- a/Makefile > +++ b/Makefile > @@ -288,6 +288,7 @@ install-systemd: systemd/mdmon@.service > $(INSTALL) -D -m 644 systemd/mdmonitor.service $(DESTDIR)$(SYSTEMD_DIR)/mdmonitor.service > $(INSTALL) -D -m 644 systemd/mdadm-last-resort@.timer $(DESTDIR)$(SYSTEMD_DIR)/mdadm-last-resort@.timer > $(INSTALL) -D -m 644 systemd/mdadm-last-resort@.service $(DESTDIR)$(SYSTEMD_DIR)/mdadm-last-resort@.service > + $(INSTALL) -D -m 644 systemd/mdadm-inc@.service $(DESTDIR)$(SYSTEMD_DIR)/mdadm-inc@.service > $(INSTALL) -D -m 755 systemd/mdadm.shutdown $(DESTDIR)$(SYSTEMD_DIR)-shutdown/mdadm.shutdown > if [ -f /etc/SuSE-release -o -n "$(SUSE)" ] ;then $(INSTALL) -D -m 755 systemd/SUSE-mdadm_env.sh $(DESTDIR)$(SYSTEMD_DIR)/../scripts/mdadm_env.sh ;fi > diff --git a/systemd/mdadm-inc@.service b/systemd/mdadm-inc@.service > new file mode 100644 > index 0000000..b7a97a3 > --- /dev/null > +++ b/systemd/mdadm-inc@.service > @@ -0,0 +1,10 @@ > +[Unit] > +Description=MD incremental assemblation on %I > +DefaultDependencies=no > +Before=initrd-switch-root.target > + > +[Service] > +Type=forking > +GuessMainPID=false > +ExecStart=/sbin/mdadm -I %I > +SuccessExitStatus=0 4 > diff --git a/udev-md-raid-assembly.rules b/udev-md-raid-assembly.rules > index 824e7a9..e295875 100644 > --- a/udev-md-raid-assembly.rules > +++ b/udev-md-raid-assembly.rules > @@ -27,7 +27,9 @@ LABEL="md_inc" > # remember you can limit what gets auto/incrementally assembled by > # mdadm.conf(5)'s 'AUTO' and selectively whitelist using 'ARRAY' > -ACTION=="add|change", IMPORT{program}="/sbin/mdadm --incremental --export $devnode --offroot ${DEVLINKS}" > +ACTION=="add|change", PROGRAM="/bin/readlink /sbin/init", RESULT=="*systemd", TAG+="systemd", ENV{SYSTEMD_WANTS}="mdadm-inc@$devnode.service" > +ACTION=="add|change", ENV{SYSTEMD_WANTS}!="?*", IMPORT{program}="/sbin/mdadm --incremental --export $devnode --offroot ${DEVLINKS}" > + > ACTION=="add|change", ENV{MD_STARTED}=="*unsafe*", ENV{MD_FOREIGN}=="no", ENV{SYSTEMD_WANTS}+="mdadm-last-resort@$env{MD_DEVICE}.timer" > ACTION=="remove", ENV{ID_PATH}=="?*", RUN+="/sbin/mdadm -If $name --path $env{ID_PATH}" > ACTION=="remove", ENV{ID_PATH}!="?*", RUN+="/sbin/mdadm -If $name" > -- > 1.8.4.5 >
Attachment:
signature.asc
Description: PGP signature