>> If that would ever change, the optimization here would be lost and we >> would have to think of something else. Nothing would actually break - >> and it's all kept directly in page_alloc.c > > Sure, but then it can become a pointless code churn. Indeed, and if there are valid concerns that this will happen in the near future (e.g., < 1 year), I agree that we should look into alternatives right from the start. Otherwise it's good enough until some of the other things I mentioned below become real (which could also take a while ...). > >> I'd like to stress that what I propose here is both simple and powerful. >> >>> possible I think, such as preparing a larger MIGRATE_UNMOVABLE area in the >>> existing memory before we allocate those long-term management structures. Or >>> onlining a bunch of blocks as zone_movable first and only later convert to >>> zone_normal in a controlled way when existing normal zone becomes depeted? >> >> I see the following (more or less complicated) alternatives >> >> 1) Having a larger MIGRATE_UNMOVABLE area >> >> a) Sizing it is difficult. I mean you would have to plan ahead for all >> memory you might eventually hotplug later - and that could even be > > Yeah, hence my worry about existing interfaces that work on 128MB blocks > individually without a larger strategy. Yes, in the works :) > >> impossible if you hotplug quite a lot of memory to a smaller machine. >> (I've seen people in the vm/container world trying to hotplug 128GB >> DIMMs to 2GB VMs ... and failing for obvious reasons) > > Some planning should still be possible to maximize the contiguous area without > unmovable allocations. Indeed, optimizing that is very high on my list of things to look into ... >> >> we would, once again, never be able to allocate a gigantic page because >> all [N] would contain a memmap. > > The second approach should work, if you know how much you are going to online, > and plan the size the N group accordingly, and if the onlined amount is several > gigabytes, then only the first one (or first X) will be unusable for a gigantic > page, but the rest would be? Can't get much better than that. Indeed, it's the optimal case (assuming one can come up with a safe zone balance - which is usually possible, but unfortunately, there are exceptions one at least has to identify). [...] > > I've reviewed the series and I won't block it - yes it's an optimistic approach > that can break and leave us with code churn. But at least it's not that much Thanks. I'll try to document somewhere that the behavior of FOP_TO_TAIL is a pure optimization and might change in the future - along with the case it tried to optimize (so people know what the use case was). > code and the extra test in __free_one_page() shouldn't make this hotpath too I assume the compiler is able to completely propagate constants and optimize that out - I haven't checked, though. > worse. But I still hope we can achieve a more robust solution one day. I definitely agree. I'd also prefer some kind of guarantees, but I learned that things always sound easier than they actually are when it comes to memory management in Linux ... and they take a lot of time (for example, Michal's/Oscar's attempts to implement vmemmap on hotadded memory). -- Thanks, David / dhildenb