Re: [PATCH v2] mm/memory-hotplug: Add sysfs hot-remove trigger

Robin Murphy <robin.murphy@xxxxxxx> · Tue, 26 Feb 2019 15:12:40 +0000

On 25/02/2019 21:14, David Hildenbrand wrote:
On 12.02.19 16:11, Michal Hocko wrote:
On Tue 12-02-19 14:54:36, Robin Murphy wrote:
On 12/02/2019 08:33, Michal Hocko wrote:
On Mon 11-02-19 17:50:46, Robin Murphy wrote:
ARCH_MEMORY_PROBE is a useful thing for testing and debugging hotplug,
but being able to exercise the (arguably trickier) hot-remove path would
be even more useful. Extend the feature to allow removal of offline
sections to be triggered manually to aid development.

Since process dictates the new sysfs entry be documented, let's also
document the existing probe entry to match - better 13-and-a-half years
late than never, as they say...

The probe sysfs is quite dubious already TBH. Apart from testing, is
anybody using it for something real? Do we need to keep an API for
something testing only? Why isn't a customer testing module enough for
such a purpose?

 From the arm64 angle, beyond "conventional" servers where we can hopefully
assume ACPI, I can imagine there being embedded/HPC setups (not all as
esoteric as that distributed-memory dRedBox thing), as well as virtual
machines, that are DT-based with minimal runtime firmware. I'm none too keen
on the idea either, but if such systems want to support physical hotplug
then driving it from userspace might be the only reasonable approach. I'm
just loath to actually document it as anything other than a developer
feature so as not to give the impression that I consider it anything other
than a last resort for production use.

This doesn't sound convicing to add an user API.

I do note that my x86 distro kernel
has ARCH_MEMORY_PROBE enabled despite it being "for testing".

Yeah, there have been mistakes done in the API land & hotplug in the
past.

In other words, why do we have to add an API that has to be maintained
for ever for a testing only purpose?

There's already half the API being maintained, though, so adding the
corresponding other half alongside it doesn't seem like that great an
overhead, regardless of how it ends up getting used.

As already said above. The hotplug user API is not something to follow
for the future development. So no, we are half broken let's continue is
not a reasonable argument.

Ultimately, though,
it's a patch I wrote because I needed it, and if everyone else is adamant
that it's not useful enough then fair enough - it's at least in the list
archives now so I can sleep happy that I've done my "contributing back" bit
as best I could :)

I am not saing this is not useful. It is. But I do not think we want to
make it an official api without a strong usecase. And then we should
think twice to make the api both useable and reasonable. A kernel module
for playing sounds like more than sufficient.

I'm late for the party, I consider this very useful for testing, but I
agree that this should not be an official API.

The memory API is already very messed up. We have the "removable"
attribute which does not mean that memory is removable. It means that
memory can be offlined and eventually "unplugged/removed" if the HW
supports it (e.g. a DIMM, otherwise it will never go).

You would be introducing "remove", which would sometimes not work when
"removable" indicates "true" (because your API only works if memory has
already been offlined). I would much rather want to see some of the mess
get cleaned up than new stuff getting added that might not be needed
besides for testing. Yes, not your fault, but an API that keeps getting
more confusing.

OK, I guess Andrew should probably drop this patch from -next - I'll add 
my own self-nack if it helps :)

The back of my mind is still ticking over trying to think up a really 
nice design for a self-contained debugfs or module-parameter interface 
completely independent of ARCH_MEMORY_PROBE - I'll probably keep using 
this hack locally to finish off the arm64 hot-remove stuff, but once 
that's done (or if inspiration strikes in the meantime) then I'll try to 
come back with a prototype of the developer interface that I'd find most 
useful.

I am really starting to strongly dislike the "removable" attribute.

Yeah, I think in general we could do with a new term for boolean-like 
things which have values of "no" and "maybe" - or "yes" and "maybe" in 
the case of security vulnerabilities :)

Robin.