On Mon, Nov 7, 2011 at 4:00 AM, Lennart Poettering <lennart@xxxxxxxxxxxxxx> wrote: > On Mon, 07.11.11 13:52, NeilBrown (neilb@xxxxxxx) wrote: > >> > Why doesn't the kernel do that on its own? >> >> Because the kernel doesn't know about the format of the metadata that >> describes the array. > > Yupp, my suggestion would be to change that. It's quite a bit of idiosyncratic code that needs to be duplicated in kernel space and userspace (since userspace always needs to know how to parse the metadata for array assembly). All to record a dirty bit that flips at most every 5 seconds, or a disk failure event which is even less frequent. Throw in policy constraints like restricting which block devices can become part of the raid set. Rinse and repeat for every possible metadata format. [..] >> What exactly is "kill_all_processes()"? is it SIGTERM or SIGKILL or both >> with a gap or ??? > > SIGTERM followed by SIGKILL after 5s if the programs do not react to > that in time. But note that this logic only applies to processes which > for some reason managed to escape systemd's usual cgroup-based killing > logic. Normal services are hence already killed at that time, and only > processes which moved themselves out of any cgroup or for which the > service files disabled killing might survive to this point. So I think mdmon should always try to escape itself from cgroup based killing. It follows the lifespan of the array, and if the array is not stopped by the cgroup exit (or the array lifespan is not controlled in a service file), then mdmon must keep running. [..] > > Here's another attempt in explaining how this works: > > <snip> > terminate_all_mount_and_service_units(); > kill_all_remaining_processes(); > do { > umount_all_remaining_file_systems_we_can(); > read_only_mount_all_remaining_file_systems(); > detach_all_remaining_loop_devices(); > detach_all_remaining_swap_devices(); > detach_all_remaining_dm_devices(); So I've started putting together a md_detach_all() routine that will attempt to stop all md devices (via sysfs). Where all mdmon instances have missed the initial killall with the argv '@' flagging. Like the dm implementation it will address all but the root md device. > } while (we_had_some_success_with_that()); > jump_into_initrd(); The final act of the initramfs is then "mdadm --wait-clean --scan" to communicate with the rootfs-blockdev-mdmon to be sure the metadata has been marked clean. All other mdmon instances should have exited naturally when their md devices stopped, but the "--wait-clean --scan" will have ensured shutdown can progress safely. > You have relatively flexible control of the first step in this code. The > second step is then the hammer that tries to fix up what this step > didn't accomplish. My suggestion to check argv[0][0] was to avoid the > hammer. I notice that if the rootfs is on a dm or md device systemd/shutdown will always fall through to ultimate_send_signal() which will not discriminate against processes flagged with '@'. Since we aren't stopping the root md device I wonder if ultimate_send_signal() should also ignore flagged processes, or whether the failure to stop the root device is to be expected and let shutdown skip ultimate_send_signal() if the only remaining work is shutting down the rootfs-blockdev. I'm leaning towards the latter. -- Dan -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html