On Fri 17-01-20 14:08:06, David Hildenbrand wrote: > On 17.01.20 12:33, Michal Hocko wrote: > > On Fri 17-01-20 11:57:59, David Hildenbrand wrote: > >> Let's refactor that code. We want to check if we can offline memory > >> blocks. Add a new function is_mem_section_offlineable() for that and > >> make it call is_mem_section_offlineable() for each contained section. > >> Within is_mem_section_offlineable(), add some more sanity checks and > >> directly bail out if the section contains holes or if it spans multiple > >> zones. > > > > I didn't read the patch (yet) but I am wondering. If we want to touch > > this code, can we simply always return true there? I mean whoever > > depends on this check is racy and the failure can happen even after > > the sysfs says good to go, right? The check is essentially as expensive > > as calling the offlining code itself. So the only usecase I can think of > > is a dumb driver to crawl over blocks and check which is removable and > > try to hotremove it. But just trying to offline one block after another > > is essentially going to achieve the same. > > Some thoughts: > > 1. It allows you to check if memory is likely to be offlineable without > doing expensive locking and trying to isolate pages (meaning: > zone->lock, mem_hotplug_lock. but also, calling drain_all_pages() > when isolating) > > 2. There are use cases that want to identify a memory block/DIMM to > unplug. One example is PPC DLPAR code (see this patch). Going over all > memory block trying to offline them is an expensive operation. > > 3. powerpc-utils (https://github.com/ibm-power-utilities/powerpc-utils) > makes use of /sys/.../removable to speed up the search AFAIK. Well, while I do see those points I am not really sure they are worth having a broken (by-definition) interface. > 4. lsmem displays/groups by "removable". Is anybody really using that? > 5. If "removable=false" then it usually really is not offlineable. > Of course, there could also be races (free the last unmovable page), > but it means "don't even try". OTOH, "removable=true" is more racy, > and gives less guarantees. ("looks okay, feel free to try") Yeah, but you could be already pessimistic and try movable zones before other kernel zones. > > Or does anybody see any reasonable usecase that would break if we did > > that unconditional behavior? > > If we would return always "true", then the whole reason the > interface originally was introduced would be "broken" (meaning, less > performant as you would try to offline any memory block). I would argue that the whole interface is broken ;). Not the first time in the kernel development history and not the last time either. What I am trying to say here is that unless there are _real_ usecases depending on knowing that something surely is _not_ offlineable then I would just try to drop the functionality while preserving the interface and see what happens. -- Michal Hocko SUSE Labs