Re: IMSM - problem with reshape+systemd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 18 Apr 2014 09:24:35 +0000 "Baldysiak, Pawel"
<pawel.baldysiak@xxxxxxxxx> wrote:

> Hi Neil/All.
> We have discovered some problems with IMSM array reshape under OSs managed by systemd.
> In case of reshape of arrays with IMSM metadata, mdadm manages the whole reshape process and it needs to be running in background.
> If we reboot while reshaping, array will be assembled at startup
> by udevworker by IMPORT{program}="mdadm -I /dev/sdX --export --offroot" part of udev rule array.
> Mdadm will fork and continue to reshape an array from checkpoint.
> However, systemd will treat udevworker as hanged process and it will be killed due to timeout with all its children (reshape will hang then).
> I had planned to propose a patch for this problem, where additional unit file will be added
> and udev will start systemd-service for mdadm -I command (see below),
> but then we will lose information about exported variables - the ones that are used to trigger mdadm-last-resort service.
> 
> Do You have any idea how to solve this problem, and keep both functionalities?

Hi,
 thanks for raising this issue.

 I think we need to address this using "mdadm --grow --continue".

 e.g. in used we run "mdadm -I --freeze-reshape --export" and arrange for
 that to report some setting if a reshape is needed.
 If it is needed, we set SYSTEMD_WANTS to some service which will run "mdadm
 --grow --continue $device".

 Possibly we could get mdadm to run "systemctl start mdadm-reshape@$dev"
 instead of forking, like it now does for running mdmon.

 I might have a poke at the code and see what falls out.

NeilBrown



> 
> Pawel Baldysiak
> 
> --------------------------------------------------------------------------------------------------------------
> My patch ("IMPORT{program}" behaves same as "RUN", but exports output as variables):
> 
> >From 8549f0ffcd72589cedf24d07b496af2ce16d14ec Mon Sep 17 00:00:00 2001
> From: Pawel Baldysiak <pawel.baldysiak@xxxxxxxxx>
> Date: Thu, 10 Apr 2014 15:16:02 +0200
> Subject: [PATCH] Use unit file for incremental assemblation from udev.
> 
> Incremental assemblation of an array at OS boot is started by RUN
> command triggered by udev, so far. RUN command is used for starting
> short-time processes that will complete quickly. Some operations, like
> reshape of IMSM arrays, are managed by mdadm. In OSs managed by systemd -
> udev worker that triggered "mdadm -I" will be terminated by SIGKILL due
> to timeout. This also kills mdadm process, so reshape will stop.
> 
> This patch adds new unit file, that will be started in OSs managed by
> systemd instead of "RUN=" command. Udev rule will only start the new
> service and finish its work. Unit file will start "mdadm -I" for disk
> passed as an argument from rule.
> 
> In scenario where we reshape IMSM array, general migration record is
> written only on two first disks of an array, so if we reboot OS and udev
> starts adding disks from e.g. the last one, "mdadm -I" will end with
> exit code "4" due to inaccessible general migration record. This should
> also be considered as success exit status, because disk is successfully
> assembled according to its metadata. Otherwise system will log
> information about service failure.
> 
> Signed-off-by: Pawel Baldysiak <pawel.baldysiak@xxxxxxxxx>
> Reviewed-by: Artur Paszkiewicz <artur.paszkiewicz@xxxxxxxxx>
> ---
> Makefile                    |  1 +
> systemd/mdadm-inc@.service  | 10 ++++++++++
> udev-md-raid-assembly.rules |  4 +++-
> 3 files changed, 14 insertions(+), 1 deletion(-)
> create mode 100644 systemd/mdadm-inc@.service
> 
> diff --git a/Makefile b/Makefile
> index b823d85..b199efd 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -288,6 +288,7 @@ install-systemd: systemd/mdmon@.service
>                $(INSTALL) -D -m 644 systemd/mdmonitor.service $(DESTDIR)$(SYSTEMD_DIR)/mdmonitor.service
>                $(INSTALL) -D -m 644 systemd/mdadm-last-resort@.timer $(DESTDIR)$(SYSTEMD_DIR)/mdadm-last-resort@.timer
>                $(INSTALL) -D -m 644 systemd/mdadm-last-resort@.service $(DESTDIR)$(SYSTEMD_DIR)/mdadm-last-resort@.service
> +             $(INSTALL) -D -m 644 systemd/mdadm-inc@.service $(DESTDIR)$(SYSTEMD_DIR)/mdadm-inc@.service
>                $(INSTALL) -D -m 755 systemd/mdadm.shutdown $(DESTDIR)$(SYSTEMD_DIR)-shutdown/mdadm.shutdown
>                if [ -f /etc/SuSE-release -o -n "$(SUSE)" ] ;then $(INSTALL) -D -m 755 systemd/SUSE-mdadm_env.sh $(DESTDIR)$(SYSTEMD_DIR)/../scripts/mdadm_env.sh ;fi
> diff --git a/systemd/mdadm-inc@.service b/systemd/mdadm-inc@.service
> new file mode 100644
> index 0000000..b7a97a3
> --- /dev/null
> +++ b/systemd/mdadm-inc@.service
> @@ -0,0 +1,10 @@
> +[Unit]
> +Description=MD incremental assemblation on %I
> +DefaultDependencies=no
> +Before=initrd-switch-root.target
> +
> +[Service]
> +Type=forking
> +GuessMainPID=false
> +ExecStart=/sbin/mdadm -I %I
> +SuccessExitStatus=0 4
> diff --git a/udev-md-raid-assembly.rules b/udev-md-raid-assembly.rules
> index 824e7a9..e295875 100644
> --- a/udev-md-raid-assembly.rules
> +++ b/udev-md-raid-assembly.rules
> @@ -27,7 +27,9 @@ LABEL="md_inc"
>  # remember you can limit what gets auto/incrementally assembled by
> # mdadm.conf(5)'s 'AUTO' and selectively whitelist using 'ARRAY'
> -ACTION=="add|change", IMPORT{program}="/sbin/mdadm --incremental --export $devnode --offroot ${DEVLINKS}"
> +ACTION=="add|change", PROGRAM="/bin/readlink /sbin/init", RESULT=="*systemd", TAG+="systemd", ENV{SYSTEMD_WANTS}="mdadm-inc@$devnode.service"
> +ACTION=="add|change", ENV{SYSTEMD_WANTS}!="?*", IMPORT{program}="/sbin/mdadm --incremental --export $devnode --offroot ${DEVLINKS}"
> +
> ACTION=="add|change", ENV{MD_STARTED}=="*unsafe*", ENV{MD_FOREIGN}=="no", ENV{SYSTEMD_WANTS}+="mdadm-last-resort@$env{MD_DEVICE}.timer"
> ACTION=="remove", ENV{ID_PATH}=="?*", RUN+="/sbin/mdadm -If $name --path $env{ID_PATH}"
> ACTION=="remove", ENV{ID_PATH}!="?*", RUN+="/sbin/mdadm -If $name"
> --
> 1.8.4.5
> 

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux