As raised by John Hubbard [1], offline_and_remove_memory() failing on fatal signals can be sub-optimal for out-of-tree drivers: dying user space might be the last one holding a device node open. As that device node gets closed, the driver might unplug the device and trigger offline_and_remove_memory() to unplug previously hotplugged device memory. This, however, will fail reliably when fatal signals are pending on the dying process, turning the device unusable until the machine gets rebooted. That can be optizied easily by ignoring fatal signals. In fact, checking for fatal signals in the case of offline_and_remove_memory() doesn't make too much sense; the check makes sense when offlining is triggered directly via sysfs. However, we actually do want a way to not end up stuck in offline_and_remove_memory() forever. What offline_and_remove_memory() users actually want is fail after some given timeout and not care about fatal signals. So let's implement that, optimizing virtio-mem along the way. Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> Cc: "Michael S. Tsirkin" <mst@xxxxxxxxxx> Cc: John Hubbard <jhubbard@xxxxxxxxxx> Cc: Oscar Salvador <osalvador@xxxxxxx> Cc: Michal Hocko <mhocko@xxxxxxxx> Cc: Jason Wang <jasowang@xxxxxxxxxx> Cc: Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> [1] https://lkml.kernel.org/r/20230620011719.155379-1-jhubbard@xxxxxxxxxx David Hildenbrand (5): mm/memory_hotplug: check for fatal signals only in offline_pages() virtio-mem: convert most offline_and_remove_memory() errors to -EBUSY mm/memory_hotplug: make offline_and_remove_memory() timeout instead of failing on fatal signals virtio-mem: set the timeout for offline_and_remove_memory() to 10 seconds virtio-mem: check if the config changed before (fake) offlining memory drivers/virtio/virtio_mem.c | 22 +++++++++++++-- include/linux/memory_hotplug.h | 2 +- mm/memory_hotplug.c | 50 ++++++++++++++++++++++++++++++++-- 3 files changed, 68 insertions(+), 6 deletions(-) base-commit: 6995e2de6891c724bfeb2db33d7b87775f913ad1 -- 2.40.1