On 02/10/2017 11:06 AM, Anshuman Khandual wrote: > This three patches define CDM node with HugeTLB & Buddy allocation > isolation. Please refer to the last RFC posting mentioned here for details. > The series has been split for easier review process. The next part of the > work like VM flags, auto NUMA and KSM interactions with tagged VMAs will > follow later. Hi, I'm not sure if the splitting to smaller series and focusing on partial implementations is helpful at this point, until there's some consensus about the whole approach from a big picture perspective. Note that it's also confusing that v1 of this partial patchset mentioned some alternative implementations, but only as git branches, and the discussion about their differences is linked elsewhere. That further makes meaningful review harder IMHO. Going back to the bigger picture, I've read the comments on previous postings and I think Jerome makes many good points in this subthread [1] against the idea of representing the device memory as generic memory nodes and expecting userspace to mbind() to them. So if I make a program that uses mbind() to back some mmapped area with memory of "devices like accelerators, GPU cards, network cards, FPGA cards, PLD cards etc which might contain on board memory", then it will get such memory... and then what? How will it benefit from it? I will also need to tell some driver to make the device do some operations with this memory, right? And that most likely won't be a generic operation. In that case I can also ask the driver to give me that memory in the first place, and it can apply whatever policies are best for the device in question? And it's also the driver that can detect if the device memory is being wasted by a process that isn't currently performing the interesting operations, while another process that does them had to fallback its allocations to system memory and thus runs slower. I expect the NUMA balancing can't catch that for device memory (and you also disable it anyway?) So I don't really see how a generic solution would work, without having a full concrete example, and thus it's really hard to say that this approach is the right way to go and should be merged. The only examples I've noticed that don't require any special operations to benefit from placement in the "device memory", were fast memories like MCDRAM, which differentiate by performance of generic CPU operations, so it's not really a "device memory" by your terminology. And I would expect policing access to such performance differentiated memory is already possible with e.g. cpusets? Thanks, Vlastimil [1] https://lkml.kernel.org/r/20161025153256.GB6131@xxxxxxxxx > https://lkml.org/lkml/2017/1/29/198 > > Changes in V2: > > * Removed redundant nodemask_has_cdm() check from zonelist iterator > * Dropped the nodemask_had_cdm() function itself > * Added node_set/clear_state_cdm() functions and removed bunch of #ifdefs > * Moved CDM helper functions into nodemask.h from node.h header file > * Fixed the build failure by additional CONFIG_NEED_MULTIPLE_NODES check > > Previous V1: (https://lkml.org/lkml/2017/2/8/329) > > Anshuman Khandual (3): > mm: Define coherent device memory (CDM) node > mm: Enable HugeTLB allocation isolation for CDM nodes > mm: Enable Buddy allocation isolation for CDM nodes > > Documentation/ABI/stable/sysfs-devices-node | 7 ++++ > arch/powerpc/Kconfig | 1 + > arch/powerpc/mm/numa.c | 7 ++++ > drivers/base/node.c | 6 +++ > include/linux/nodemask.h | 58 ++++++++++++++++++++++++++++- > mm/Kconfig | 4 ++ > mm/hugetlb.c | 25 ++++++++----- > mm/memory_hotplug.c | 3 ++ > mm/page_alloc.c | 24 +++++++++++- > 9 files changed, 123 insertions(+), 12 deletions(-) > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>