On Fri 11-09-20 12:09:52, David Hildenbrand wrote: > On 11.09.20 11:12, Michal Hocko wrote: > > On Fri 11-09-20 10:09:07, David Hildenbrand wrote: > > [...] > >> Consider two cases: > >> > >> 1. Hot(un)plugging huge DIMMs: many (not all!) use cases want to > >> online/offline the whole thing. HW can effectively only plug/unplug the > >> whole thing. It makes sense in some (most?) setups to represent one DIMM > >> as one memory block device. > > > > Yes, for the physical hotplug it doesn't really make much sense to me to > > offline portions that the HW cannot hotremove. > > I've seen people offline parts of memory to simulate systems with less > RAM and people offline parts of memory on demand to save energy > (poweroff banks). People won't stop being creative with what we provided > to them :D Heh, I have seen people shooting their foot for fun. But more seriously, I do undestand different usecases and we shouldn't cut them off their toys. > >> 2. Hot(un)plugging small memory increments. This is mostly the case in > >> virtualized environments - especially hyper-v balloon, xen balloon, > >> virtio-mem and (drumroll) ppc dlpar and s390x standby memory. On PPC, > >> you want at least all (16MB!) memory block devices that can get > >> unplugged again individually ("LMBs") as separate memory blocks. Same on > >> s390x on memory increment size (currently effectively the memory block > >> size). > > > > Yes I do recognize those usecase even though I will not pretend I > > consider it quesitonable. E.g. any hotplug with a smaller granularity > > than the memory model in Linus allows is just dubious. We simply cannot > > implement that without a lot of wasting and then the question is what is > > the real point. > > Having the section size as small as possible in these environments is > most certainly preferable, to clean up metadata where possible. There is a certain line that is hard to maintain. I consider a section to be the smallest granularity that makes sense to support. Current section sizing makes sense from the VMEMMAP point of view. If there are strong reasons to allow smaller once then I belive this should be compile time option. > Otherwise, hot(un)plugging smaller granularity behaves more like memory > ballooning (and I think I don't have to tell you that ballooning is used > excessively even though it wastes memory on metadata ;) ). Anyhow, > that's another discussion. Yeah, I am aware of that. And honestly subsection offlining makes very little sense to me. It was hard to argue against that for nvdimm usecases where we simply had to workaround the reality where devices couldn't have been aligned properly. I do not think we want to claim a support for general hotplug though. [...] > > There is only one certainty. Providing a long term interface with ever > > growing (ab)users is a hard target. And shinyN might be needed in the > > end. Who knows. My main point is that the existing interface is hitting > > a wall on usecases which _do_not_care_ about memory hotplug. And that is > > something we should be looking at. > > Agreed. I can see 3 scenarios > > a) no memory hotplug support, no sysfs. > b) memory hotplug support, no sysfs > c) memory hotplug support, sysfs > > Starting with a) and c) is the easiest way to go. Yes, the first and the simplest way would be to provide memory_hotplug=[disabled|v1] where disabled would be no sysfs interface, v1 would be the existing infrastructure. I would hope to land with v2 in a future which would provide a new interface. -- Michal Hocko SUSE Labs