On 08.07.20 17:50, Dan Williams wrote: > On Wed, Jul 8, 2020 at 3:04 AM David Hildenbrand <david@xxxxxxxxxx> wrote: >> >> On 08.07.20 11:45, Mike Rapoport wrote: >>> On Wed, Jul 08, 2020 at 11:25:36AM +0200, David Hildenbrand wrote: >>>> On 08.07.20 11:15, Mike Rapoport wrote: >>>>>>>>>> >>>>>>> But on more theoretical/fundmanetal level, I think we lack a generic >>>>>>> abstraction similar to e.g. x86 'struct numa_meminfo' that serves as >>>>>>> translaton of firmware supplied information into data that can be used >>>>>>> by the generic mm without need to reimplement it for each and every >>>>>>> arch. >>>>>> >>>>>> Right. As I expressed, I am not a friend of using memblock for that, and >>>>>> the pgdat node span is tricky. >>>>>> >>>>>> Maybe abstracting that x86 concept is possible in some way (and we could >>>>>> restrict the information to boot-time properties, so we don't have to >>>>>> mess with memory hot(un)plug - just as done for numa_meminfo AFAIKS). >>>>> >>>>> I agree with pgdat part and disagree about memblock. It already has >>>>> non-init physmap, why won't we add memblock.memory to the mix? ;-) >>>> >>>> Can we generalize and tweak physmap to contain node info? That's all we >>>> need, no? (the special mem= parameter handling should not matter for our >>>> use case, where "physmap" and "memory" would differ) >>> >>> TBH, I have only random vague thoughts at the moment. This might be an >>> option. But then we need to enable physmap on !s390, right? >> >> Yes, looks like it. >> >>> >>>>> Now, seriously, memblock already has all the necessary information about >>>>> the coldplug memory for several architectures. x86 being an exception >>>>> because for some reason the reserved memory is not considered memory >>>>> there. The infrastructure for quiering and iterating memory regions is >>>>> already there. We just need to leave out the irrelevant parts, like >>>>> memblock.reserved and allocation funcions. >>>> >>>> I *really* don't want to mess with memblocks on memory hot(un)plug on >>>> x86 and s390x (+other architectures in the future). I also thought about >>>> stopping to create memblocks for hotplugged memory on arm64, by tweaking >>>> pfn_valid() to query memblocks only for early sections. >>>> >>>> If "physmem" is not an option, can we at least introduce something like >>>> ARCH_UPDTAE_MEMBLOCK_ON_HOTPLUG to avoid doing that on x86 and s390x for >>>> now (and later maybe for others)? >>> >>> I have to do more memory hotplug howework to answer that ;-) >>> >>> My general point is that we don't have to reinvent the wheel to have >>> coldplug memory representation, it's already there. We just need a way >>> to use it properly. >> >> Yes, I tend to agree. Details to be clarified :) > > I'm not quite understanding the concern, or requirement about > "updating memblock" in the hotplug path. The routines > memory_add_physaddr_to_nid() and phys_to_target_node() are helpers to > interrogate platform-firmware numa info through a common abstraction. > They place no burden on the memory hotplug code they're just used to > see if a hot-added range lies within an existing node span when > platform-firmware otherwise fails to communicate a node. x86 can > continue to back those helpers with numa_meminfo, arm64 can use a > generic memblock implementation and other archs can follow the arm64 > example if they want better numa answers for drivers. > See memblock_add_node()/memblock_remove() in mm/memory_hotplug.c. I don't want that code be reactivated for x86/s390x. That's all I am saying. -- Thanks, David / dhildenb