On 27.01.20 14:29, Michal Hocko wrote: > On Fri 24-01-20 16:53:36, David Hildenbrand wrote: >> We see multiple issues with the implementation/interface to compute >> whether a memory block can be offlined (exposed via >> /sys/devices/system/memory/memoryX/removable) and would like to simplify >> it (remove the implementation). >> >> 1. It runs basically lockless. While this might be good for performance, >> we see possible races with memory offlining/unplug that will require >> at least some sort of locking to fix. >> >> 2. Nowadays, more false positives are possible. No arch-specific checks >> are performed that validate if memory offlining will not be denied >> right away (and such check will require locking). For example, arm64 >> won't allow to offline any memory block that was added during boot - >> which will imply a very high error rate. Other archs have other >> constraints. >> >> 3. The interface is inherently racy. E.g., if a memory block is >> detected to be removable (and was not a false positive at that time), >> there is still no guarantee that offlining will actually succeed. So >> any caller already has to deal with false positives. >> >> 4. It is unclear which performance benefit this interface actually >> provides. The introducing commit 5c755e9fd813 ("memory-hotplug: add >> sysfs removable attribute for hotplug memory remove") mentioned >> "A user-level agent must be able to identify which sections of >> memory are likely to be removable before attempting the >> potentially expensive operation." >> However, no actual performance comparison was included. >> >> Known users: >> - lsmem: Will group memory blocks based on the "removable" property. [1] >> - chmem: Indirect user. It has a RANGE mode where one can specify >> removable ranges identified via lsmem to be offlined. However, it >> also has a "SIZE" mode, which allows a sysadmin to skip the manual >> "identify removable blocks" step. [2] >> - powerpc-utils: Uses the "removable" attribute to skip some memory >> blocks right away when trying to find some to >> offline+remove. However, with ballooning enabled, it >> already skips this information completely (because it >> once resulted in many false negatives). Therefore, the >> implementation can deal with false positives properly >> already. [3] >> >> With CONFIG_MEMORY_HOTREMOVE, always indicating "removable" should not >> break any user space tool. We implement a very bad heuristic now. (in >> contrast: always returning "not removable" would at least affect >> powerpc-utils) >> >> Without CONFIG_MEMORY_HOTREMOVE we cannot offline anything, so report >> "not removable" as before. >> >> Original discussion can be found in [4] ("[PATCH RFC v1] mm: >> is_mem_section_removable() overhaul"). >> >> Other users of is_mem_section_removable() will be removed next, so that >> we can remove is_mem_section_removable() completely. >> >> [1] http://man7.org/linux/man-pages/man1/lsmem.1.html >> [2] http://man7.org/linux/man-pages/man8/chmem.8.html >> [3] https://github.com/ibm-power-utilities/powerpc-utils >> [4] https://lkml.kernel.org/r/20200117105759.27905-1-david@xxxxxxxxxx >> >> Suggested-by: Michal Hocko <mhocko@xxxxxxxxxx> >> Cc: Dan Williams <dan.j.williams@xxxxxxxxx> >> Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> >> Cc: "Rafael J. Wysocki" <rafael@xxxxxxxxxx> >> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> >> Cc: powerpc-utils-devel@xxxxxxxxxxxxxxxx >> Cc: util-linux@xxxxxxxxxxxxxxx >> Cc: Badari Pulavarty <pbadari@xxxxxxxxxx> >> Cc: Nathan Fontenot <nfont@xxxxxxxxxxxxxxxxxx> >> Cc: Robert Jennings <rcj@xxxxxxxxxxxxxxxxxx> >> Cc: Heiko Carstens <heiko.carstens@xxxxxxxxxx> >> Cc: Karel Zak <kzak@xxxxxxxxxx> >> Signed-off-by: David Hildenbrand <david@xxxxxxxxxx> > > Please add information provided by Nathan. > Acked-by: Michal Hocko <mhocko@xxxxxxxx> > > Minor nit below. > >> +#ifdef CONFIG_MEMORY_HOTREMOVE >> + return sprintf(buf, "1\n"); >> +#else >> + return sprintf(buf, "0\n"); >> +#endif > int ret = IS_ENABLED(CONFIG_MEMORY_HOTREMOVE); > > return sprintf(buf, "%d\n", ret) > > would be slightly nicer than explicit ifdefs. > Indeed, thanks! -- Thanks, David / dhildenb