On Wed, Jul 8, 2020 at 3:04 AM David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 08.07.20 11:45, Mike Rapoport wrote: > > On Wed, Jul 08, 2020 at 11:25:36AM +0200, David Hildenbrand wrote: > >> On 08.07.20 11:15, Mike Rapoport wrote: > >>>>>>>> > >>>>> But on more theoretical/fundmanetal level, I think we lack a generic > >>>>> abstraction similar to e.g. x86 'struct numa_meminfo' that serves as > >>>>> translaton of firmware supplied information into data that can be used > >>>>> by the generic mm without need to reimplement it for each and every > >>>>> arch. > >>>> > >>>> Right. As I expressed, I am not a friend of using memblock for that, and > >>>> the pgdat node span is tricky. > >>>> > >>>> Maybe abstracting that x86 concept is possible in some way (and we could > >>>> restrict the information to boot-time properties, so we don't have to > >>>> mess with memory hot(un)plug - just as done for numa_meminfo AFAIKS). > >>> > >>> I agree with pgdat part and disagree about memblock. It already has > >>> non-init physmap, why won't we add memblock.memory to the mix? ;-) > >> > >> Can we generalize and tweak physmap to contain node info? That's all we > >> need, no? (the special mem= parameter handling should not matter for our > >> use case, where "physmap" and "memory" would differ) > > > > TBH, I have only random vague thoughts at the moment. This might be an > > option. But then we need to enable physmap on !s390, right? > > Yes, looks like it. > > > > >>> Now, seriously, memblock already has all the necessary information about > >>> the coldplug memory for several architectures. x86 being an exception > >>> because for some reason the reserved memory is not considered memory > >>> there. The infrastructure for quiering and iterating memory regions is > >>> already there. We just need to leave out the irrelevant parts, like > >>> memblock.reserved and allocation funcions. > >> > >> I *really* don't want to mess with memblocks on memory hot(un)plug on > >> x86 and s390x (+other architectures in the future). I also thought about > >> stopping to create memblocks for hotplugged memory on arm64, by tweaking > >> pfn_valid() to query memblocks only for early sections. > >> > >> If "physmem" is not an option, can we at least introduce something like > >> ARCH_UPDTAE_MEMBLOCK_ON_HOTPLUG to avoid doing that on x86 and s390x for > >> now (and later maybe for others)? > > > > I have to do more memory hotplug howework to answer that ;-) > > > > My general point is that we don't have to reinvent the wheel to have > > coldplug memory representation, it's already there. We just need a way > > to use it properly. > > Yes, I tend to agree. Details to be clarified :) I'm not quite understanding the concern, or requirement about "updating memblock" in the hotplug path. The routines memory_add_physaddr_to_nid() and phys_to_target_node() are helpers to interrogate platform-firmware numa info through a common abstraction. They place no burden on the memory hotplug code they're just used to see if a hot-added range lies within an existing node span when platform-firmware otherwise fails to communicate a node. x86 can continue to back those helpers with numa_meminfo, arm64 can use a generic memblock implementation and other archs can follow the arm64 example if they want better numa answers for drivers.