On Fri 11-09-20 10:09:07, David Hildenbrand wrote: [...] > Consider two cases: > > 1. Hot(un)plugging huge DIMMs: many (not all!) use cases want to > online/offline the whole thing. HW can effectively only plug/unplug the > whole thing. It makes sense in some (most?) setups to represent one DIMM > as one memory block device. Yes, for the physical hotplug it doesn't really make much sense to me to offline portions that the HW cannot hotremove. > 2. Hot(un)plugging small memory increments. This is mostly the case in > virtualized environments - especially hyper-v balloon, xen balloon, > virtio-mem and (drumroll) ppc dlpar and s390x standby memory. On PPC, > you want at least all (16MB!) memory block devices that can get > unplugged again individually ("LMBs") as separate memory blocks. Same on > s390x on memory increment size (currently effectively the memory block > size). Yes I do recognize those usecase even though I will not pretend I consider it quesitonable. E.g. any hotplug with a smaller granularity than the memory model in Linus allows is just dubious. We simply cannot implement that without a lot of wasting and then the question is what is the real point. > In summary, larger memory block devices mostly only make sense with > DIMMs (and for boot memory in some cases). We will still end up with > many memory block devices in other configurations. And that is fine because the boot time memory is still likely the primary source of memory. And reducing memory devices for those is a huge improvement already (just think of a multi TB system with gazillions pointless memory devices). > I do agree that a "disable sysfs" option is interesting - even with > memory hotplug (we mostly need a way to configure it and a way to notify > kexec-tools about memory hot(un)plug events). I am currently (once > again) looking into improving auto-onlining support in the kernel. > > Having that said, I much rather want to see smaller improvements (that > can be fine-tuned individually - like allowing variable-sized memory > blocks) than doing a switch to "new shiny" and figuring out after a > while that we need "new shiny2". There is only one certainty. Providing a long term interface with ever growing (ab)users is a hard target. And shinyN might be needed in the end. Who knows. My main point is that the existing interface is hitting a wall on usecases which _do_not_care_ about memory hotplug. And that is something we should be looking at. > I consider removing "phys_device" as one of these tunables. The question > would be how to make such sysfs changes easy to configure > ("-phys_device", "+variable_sized_blocks" ...) I am with you on that. There are more candidates in memory block directories which have dubious value. Deprecation process is a PITA and that's why I thought that it would make sense to focus on something that we can mis^Wdesign with exising and forming usecases in mind that would get rid of all the cruft that we know it doesn't work (removable would be another one. I am definitely not going to insist and I appreciate you are trying to clean this up. That is highly appreciated of course. -- Michal Hocko SUSE Labs