Re: [PATCH] mdadm/systemd: remove KillMode=none from service file

Coly Li <colyli@xxxxxxx> · Thu, 28 Jul 2022 18:55:04 +0800

> 2022年7月28日 17:01，Mariusz Tkaczyk <mariusz.tkaczyk@xxxxxxxxxxxxxxx> 写道：
> 
> On Thu, 28 Jul 2022 16:39:56 +0800
> Coly Li <colyli@xxxxxxx> wrote:
> 
>>> 2022年7月28日 15:55，Mariusz Tkaczyk <mariusz.tkaczyk@xxxxxxxxxxxxxxx>
>>> 写道：
>>> 
>>> On Tue, 15 Feb 2022 21:34:15 +0800
>>> Coly Li <colyli@xxxxxxx> wrote:
>>> 
>>>> For mdadm's systemd configuration, current systemd KillMode is "none" in
>>>> following service files,
>>>> - mdadm-grow-continue@.service
>>>> - mdmon@.service
>>>> 
>>>> This "none" mode is strongly againsted by systemd developers (see man 5
>>>> systemd.kill for "KillMode=" section), and is considering to remove in
>>>> future systemd version.
>>>> 
>>>> As systemd developer explained in disuccsion, the systemd kill process
>>>> is,
>>>> 1. send the signal specified by KillSignal= to the list of processes (if
>>>> any), TERM is the default
>>>> 2. wait until either the target of process(es) exit or a timeout expires
>>>> 3. if the timeout expires send the signal specified by FinalKillSignal=,
>>>> KILL is the default
>>>> 
>>>> For "control-group", all remaining processes will receive the SIGTERM
>>>> signal (by default) and if there are still processes after a period f
>>>> time, they will get the SIGKILL signal.
>>>> 
>>>> For "mixed", only the main process will receive the SIGTERM signal, and
>>>> if there are still processes after a period of time, all remaining
>>>> processes (including the main one) will receive the SIGKILL signal.
>>>> 
>>>> From the above comment, currently KillMode=control-group is a proper
>>>> kill mode. Since control-gropu is the default kill mode, the fix can be
>>>> simply removing KillMode=none line from the service file, then the
>>>> default mode will take effect.  
>>> 
>>> Hi All,
>>> We are experiencing issues with IMSM metadata on RHEL8.7 and 9.1 (the patch
>>> was picked by Redhat). There are several issues which results in hang task,
>>> characteristic to missing mdmon:
>>> 
>>> [ 619.521440] task:umount state:D stack: 0 pid: 6285 ppid: flags:0x00004084
>>> [ 619.534033] Call Trace:
>>> [ 619.539980] __schedule+0x2d1/0x830
>>> [ 619.547056] ? finish_wait+0x80/0x80
>>> [ 619.554261] schedule+0x35/0xa0
>>> [ 619.560999] md_write_start+0x14b/0x220
>>> [ 619.568492] ? finish_wait+0x80/0x80
>>> [ 619.575649] raid1_make_request+0x3c/0x90 [raid1]
>>> [ 619.584111] md_handle_request+0x128/0x1b0
>>> [ 619.591891] md_make_request+0x5b/0xb0
>>> [ 619.599235] generic_make_request_no_check+0x202/0x330
>>> [ 619.608185] submit_bio+0x3c/0x160
>>> [ 619.615161] ? bio_add_page+0x42/0x50
>>> [ 619.622413] submit_bh_wbc+0x16a/0x190
>>> [ 619.629713] jbd2_write_superblock+0xf4/0x210 [jbd2]
>>> [ 619.638340] jbd2_journal_update_sb_log_tail+0x65/0xc0 [jbd2]
>>> [ 619.647773] __jbd2_update_log_tail+0x3f/0x100 [jbd2]
>>> [ 619.656374] jbd2_cleanup_journal_tail+0x50/0x90 [jbd2]
>>> [ 619.665107] jbd2_log_do_checkpoint+0xfa/0x400 [jbd2]
>>> [ 619.673572] ? prepare_to_wait_event+0xa0/0x180
>>> [ 619.681344] jbd2_journal_destroy+0x120/0x2a0 [jbd2]
>>> [ 619.689551] ? finish_wait+0x80/0x80
>>> [ 619.696096] ext4_put_super+0x76/0x390 [ext4]
>>> [ 619.703584] generic_shutdown_super+0x6c/0x100
>>> [ 619.711065] kill_block_super+0x21/0x50
>>> [ 619.717809] deactivate_locked_super+0x34/0x70
>>> [ 619.725146] cleanup_mnt+0x3b/0x70
>>> [ 619.731279] task_work_run+0x8a/0xb0
>>> [ 619.737576] exit_to_usermode_loop+0xeb/0xf0
>>> [ 619.744657] do_syscall_64+0x198/0x1a0
>>> [ 619.751155] entry_SYSCALL_64_after_hwframe+0x65/0xca
>>> 
>>> It can be reproduced by mounting LVM created on IMSM RAID1 array and then
>>> reboot. I verified that reverting the patch fixes the issue.
>>> 
>>> I understand that from systemd perspective the behavior in not wanted, but
>>> this is exactly what we need, to have working mdmon process even if systemd
>>> was stopped. KillMode=none does the job.
>>> I searched for alternative way to prevent systemd from stopping the mdmon
>>> unit but I failed. I tried to change signals, so I configured unit to send
>>> SIGPIPE (because it is ignored by mdmon)- it worked but later system hanged
>>> because mdmon unit cannot be stopped.
>>> 
>>> I also tried to configure mdmon unit to be stopped after umount.target and I
>>> failed too. It cannot be achieved by setting After= or Before=. The one
>>> objection I have here is that systemd-shutdown tries to stop raid arrays
>>> later, so it could be better to have running mdmon there.
>>> 
>>> IMO KillMode=none is desired in this case. Later, mdmon is restarted in
>>> dracut by mdraid module.
>>> 
>>> If there is no other solution for the problem, I will need to ask Jes to
>>> revert this patch. For now, I asked Redhat to do it.
>>> Do you have any suggestions?  
>> 
>> 
>> If Redhat doesn’t use the latest systemd, they should drop this patch. For
>> mdadm upstream we should keep this because it was suggested by systemd
>> developer.
>> 
> 
> If we want to keep this, we need to resolve reboot problem. I described problem
> and now I'm waiting for feedback. I hope that it can be fixed in mdmon service
> fast and easy.

Hmm, in the latest systemd source code, unit_kill_context() just simply ignores KILL_NONE (KillMode=none) like this,

4776         /* Kill the processes belonging to this unit, in preparation for shutting the unit down.
4777          * Returns > 0 if we killed something worth waiting for, 0 otherwise. */
4778
4779         if (c->kill_mode == KILL_NONE)
4780                 return 0;

And no signal sent to target unit. Since there is no other location references KILL_NONE, it is not clear to me how KillMode=none may help more.

I have no too much understanding to systemd, I guess maybe (correct me if I am wrong) it was because the systemd used in RHEL is not the latest version?

> I we will determine that mdmon design update is needed then I will request to
> revert it, until fix is not ready to minimize impact on users (distros may
> pull this).

Yes I agree. But for mdadm package in RHEL, I guess they don’t always use upstream mdadm, and just do backport for selected patches as other enterprise distributions do. If the latest mdadm and latest systemd work fine together, maybe the fast fix for RHEL is to just drop this patch from their backport, it is unnecessary to wait until the patch is reverted or fixed by upstream.

BTW, can I know the exact version of systemd from RHEL 8.7 and 9.1? On my openSUSE 15.4, the systemd version is 249.11, I will try to reproduce the operations as well, and try to find some clue if I am lucky.

Thanks.

Coly Li