Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ?

Michal Hocko <mhocko@xxxxxxxx> · Fri, 11 Sep 2020 11:12:52 +0200

On Fri 11-09-20 10:09:07, David Hildenbrand wrote:
[...]
> Consider two cases:
> 
> 1. Hot(un)plugging huge DIMMs: many (not all!) use cases want to
> online/offline the whole thing. HW can effectively only plug/unplug the
> whole thing. It makes sense in some (most?) setups to represent one DIMM
> as one memory block device.

Yes, for the physical hotplug it doesn't really make much sense to me to
offline portions that the HW cannot hotremove.

> 2. Hot(un)plugging small memory increments. This is mostly the case in
> virtualized environments - especially hyper-v balloon, xen balloon,
> virtio-mem and (drumroll) ppc dlpar and s390x standby memory. On PPC,
> you want at least all (16MB!) memory block devices that can get
> unplugged again individually ("LMBs") as separate memory blocks. Same on
> s390x on memory increment size (currently effectively the memory block
> size).

Yes I do recognize those usecase even though I will not pretend I
consider it quesitonable. E.g. any hotplug with a smaller granularity
than the memory model in Linus allows is just dubious. We simply cannot
implement that without a lot of wasting and then the question is what is
the real point.

> In summary, larger memory block devices mostly only make sense with
> DIMMs (and for boot memory in some cases). We will still end up with
> many memory block devices in other configurations.

And that is fine because the boot time memory is still likely the
primary source of memory. And reducing memory devices for those is a
huge improvement already (just think of a multi TB system with
gazillions pointless memory devices). 

> I do agree that a "disable sysfs" option is interesting - even with
> memory hotplug (we mostly need a way to configure it and a way to notify
> kexec-tools about memory hot(un)plug events). I am currently (once
> again) looking into improving auto-onlining support in the kernel.
> 
> Having that said, I much rather want to see smaller improvements (that
> can be fine-tuned individually - like allowing variable-sized memory
> blocks) than doing a switch to "new shiny" and figuring out after a
> while that we need "new shiny2".

There is only one certainty. Providing a long term interface with ever
growing (ab)users is a hard target. And shinyN might be needed in the
end. Who knows. My main point is that the existing interface is hitting
a wall on usecases which _do_not_care_ about memory hotplug. And that is
something we should be looking at.

> I consider removing "phys_device" as one of these tunables. The question
> would be how to make such sysfs changes easy to configure
> ("-phys_device", "+variable_sized_blocks" ...)

I am with you on that. There are more candidates in memory block
directories which have dubious value. Deprecation process is a PITA and
that's why I thought that it would make sense to focus on something that
we can mis^Wdesign with exising and forming usecases in mind that would
get rid of all the cruft that we know it doesn't work (removable would
be another one.

I am definitely not going to insist and I appreciate you are trying to
clean this up. That is highly appreciated of course.
-- 
Michal Hocko
SUSE Labs