On 17.01.20 16:54, Dan Williams wrote: > On Fri, Jan 17, 2020 at 7:30 AM Michal Hocko <mhocko@xxxxxxxxxx> wrote: >> >> On Fri 17-01-20 15:58:26, David Hildenbrand wrote: >>> On 17.01.20 15:52, Michal Hocko wrote: >>>> On Fri 17-01-20 14:08:06, David Hildenbrand wrote: >>>>> On 17.01.20 12:33, Michal Hocko wrote: >>>>>> On Fri 17-01-20 11:57:59, David Hildenbrand wrote: >>>>>>> Let's refactor that code. We want to check if we can offline memory >>>>>>> blocks. Add a new function is_mem_section_offlineable() for that and >>>>>>> make it call is_mem_section_offlineable() for each contained section. >>>>>>> Within is_mem_section_offlineable(), add some more sanity checks and >>>>>>> directly bail out if the section contains holes or if it spans multiple >>>>>>> zones. >>>>>> >>>>>> I didn't read the patch (yet) but I am wondering. If we want to touch >>>>>> this code, can we simply always return true there? I mean whoever >>>>>> depends on this check is racy and the failure can happen even after >>>>>> the sysfs says good to go, right? The check is essentially as expensive >>>>>> as calling the offlining code itself. So the only usecase I can think of >>>>>> is a dumb driver to crawl over blocks and check which is removable and >>>>>> try to hotremove it. But just trying to offline one block after another >>>>>> is essentially going to achieve the same. >>>>> >>>>> Some thoughts: >>>>> >>>>> 1. It allows you to check if memory is likely to be offlineable without >>>>> doing expensive locking and trying to isolate pages (meaning: >>>>> zone->lock, mem_hotplug_lock. but also, calling drain_all_pages() >>>>> when isolating) >>>>> >>>>> 2. There are use cases that want to identify a memory block/DIMM to >>>>> unplug. One example is PPC DLPAR code (see this patch). Going over all >>>>> memory block trying to offline them is an expensive operation. >>>>> >>>>> 3. powerpc-utils (https://github.com/ibm-power-utilities/powerpc-utils) >>>>> makes use of /sys/.../removable to speed up the search AFAIK. >>>> >>>> Well, while I do see those points I am not really sure they are worth >>>> having a broken (by-definition) interface. >>> >>> It's a pure speedup. And for that, the interface has been working >>> perfectly fine for years? >>> >>>> >>>>> 4. lsmem displays/groups by "removable". >>>> >>>> Is anybody really using that? >>> >>> Well at least I am using that when testing to identify which >>> (ZONE_NORMAL!) block I can easily offline/re-online (e.g., to validate >>> all the zone shrinking stuff I have been fixing) >>> >>> So there is at least one user ;) >> >> Fair enough. But I would argue that there are better ways to do the same >> solely for testing purposes. Rather than having a subtly broken code to >> maintain. >> >>>> >>>>>> Or does anybody see any reasonable usecase that would break if we did >>>>>> that unconditional behavior? >>>>> >>>>> If we would return always "true", then the whole reason the >>>>> interface originally was introduced would be "broken" (meaning, less >>>>> performant as you would try to offline any memory block). >>>> >>>> I would argue that the whole interface is broken ;). Not the first time >>>> in the kernel development history and not the last time either. What I >>>> am trying to say here is that unless there are _real_ usecases depending >>>> on knowing that something surely is _not_ offlineable then I would just >>>> try to drop the functionality while preserving the interface and see >>>> what happens. >>> >>> I can see that, but I can perfectly well understand why - especially >>> powerpc - wants a fast way to sense which blocks actually sense to try >>> to online. >>> >>> The original patch correctly states >>> "which sections of >>> memory are likely to be removable before attempting the potentially >>> expensive operation." >>> >>> It works as designed I would say. >> >> Then I would just keep it crippled the same way it has been for years >> without anybody noticing. > > I tend to agree. At least the kmem driver that wants to unplug memory > could not use an interface that does not give stable answers. It just > relies on remove_memory() to return a definitive error. > Just because kmem cannot reuse such an interface doesn't mean we should not touch it (or I am not getting your point). Especially, this interface is about "can it be likely be offlined and then eventually be removed (if there is a HW interface for that)" (as documented), not about "will remove_memory()" work. We do have users and if we agree to keep it (what I think we should as I expressed) then I think we should un-cripple and fix it. After all we have to maintain it. The current interface provides what was documented - "likely to be offlineable". (the chosen name was just horribly bad - as I expressed a while ago already :) ) -- Thanks, David / dhildenb