Re: [PATCH] mm/memory_hotplug.c: don't fail hot unplug quite so eagerly

John Hubbard <jhubbard@xxxxxxxxxx> · Tue, 20 Jun 2023 14:54:31 -0700

On 6/20/23 00:12, David Hildenbrand wrote:
On 20.06.23 03:17, John Hubbard wrote:
mm/memory_hotplug.c: don't fail hot unplug quite so eagerly

Some device drivers add memory to the system via memory hotplug. When
the driver is unloaded, that memory is hot-unplugged.

Which interfaces are they using to add/remove memory?

It's coming in from the kernel driver, like this:

offline_and_remove_memory()
    walk_memory_blocks()
        try_offline_memory_block()
            device_offline()
                memory_subsys_offline()
                    offline_pages()

...and the above is getting invoked as part of killing a user space
process that was helping (for performance reasons) holding the device
nodes open. That triggers a final close of the file descriptors and
leads to tearing down the driver. The teardown succeeds even though
the memory was not offlined, and now everything is, to use a technical
term, "stuck". :)

More below...



However, memory hot unplug can fail. And these days, it fails a little
too easily, with respect to the above case. Specifically, if a signal is
pending on the process, hot unplug fails. This leads directly to: the
user must reboot the machine in order to unload the driver, and
therefore the device is unusable until the machine is rebooted.

Why can't they retry in user space when offlining fails with -EINTR, or re-trigger driver unloading?

If someone uses "kill -9" to kill that process, then we get here,
because user space cannot trap that signal.


...

--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1879,12 +1879,6 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
      do {
          pfn = start_pfn;
          do {
-            if (signal_pending(current)) {
-                ret = -EINTR;
-                reason = "signal backoff";
-                goto failed_removal_isolated;
-            }
-
              cond_resched();
              ret = scan_movable_pages(pfn, end_pfn, &pfn);

No, we can't remove that. It's documented behavior that exists precisely for that reason:

https://docs.kernel.org/admin-guide/mm/memory-hotplug.html#id21

"
When offlining is triggered from user space, the offlining context can be terminated by sending a fatal signal. A timeout based offlining can easily be implemented via:

% timeout $TIMEOUT offline_block | failure_handling
"

Otherwise, there is no way to stop an userspace-triggered offline operation that loops forever in the kernel.

OK yes, I see.


I guess switching to fatal_signal_pending() might help to some degree, it should keep the timeout trick working.

But it wouldn't help in your case because where root kills arbitrary processes. I'm not sure if that is something we should be paying attention to.


Right. I think it would be more accurate perhaps, but it wouldn't help
this particular complaint.

Perhaps it is reasonable to claim that, "well, kill -9 *means* that you
end up here!" :) And the above patch clearly is not the way to go, but...

...what about discerning between "user initiated offline_pages" and
"offline pages as part of a driver shutdown/unload"?

thanks,
--
John Hubbard
NVIDIA