On 01/30/2017 11:21 PM, Dave Hansen wrote: > Here's the flag definition: > >> +#ifdef CONFIG_COHERENT_DEVICE >> +#define VM_CDM 0x00800000 /* Contains coherent device memory */ >> +#endif > > But it doesn't match the implementation: > >> +#ifdef CONFIG_COHERENT_DEVICE >> +static void mark_vma_cdm(nodemask_t *nmask, >> + struct page *page, struct vm_area_struct *vma) >> +{ >> + if (!page) >> + return; >> + >> + if (vma->vm_flags & VM_CDM) >> + return; >> + >> + if (nmask && !nodemask_has_cdm(*nmask)) >> + return; >> + >> + if (is_cdm_node(page_to_nid(page))) >> + vma->vm_flags |= VM_CDM; >> +} > > That flag is a one-way trip. Any VMA with that flag set on it will keep > it for the life of the VMA, despite whether it has CDM pages in it now > or not. Even if you changed the policy back to one that doesn't allow > CDM and forced all the pages to be migrated out. Right, we have this limitation right now. But as I have mentioned in the reply on the other thread, will work towards both static and runtime re-evaluation of the VMA flag next time around. > > This also assumes that the only way to get a page mapped into a VMA is > via alloc_pages_vma(). Do the NUMA migration APIs use this path? Right now I have just taken care of these two paths. * Page fault path * mbind() path agreed, will work on the NUMA migration APIs paths next. Wondering if I need to update for migrate_pages() kernel API also as it will be used by the driver or should the driver tag the VMA explicitly knowing what has just happened ? I had also mentioned about this in the cover letter :) But as you have pointed out will move the documentation to the patches. " VM_CDM tagged VMA: There are two parts to this problem. * How to mark a VMA with VM_CDM ? - During page fault path - During mbind(MPOL_BIND) call - Any other paths ? - Should a driver mark a VMA with VM_CDM explicitly ? * How VM_CDM marked VMA gets treated ? - Disabled from auto NUMA migrations - Disabled from KSM merging - Anything else ? " > > When you *set* this flag, you don't go and turn off KSM merging, for > instance. You keep it from being turned on from this point forward, but > you don't turn it off. I was in the impression that the KSM merging does not start unless we do madvise(MADV_MERGEABLE) call on the VMA (where its blocked now). I might be missing something here if it can start before hand. > > This is happening with mmap_sem held for read. Correct? Is it OK that > you're modifying the VMA? That vm_flags manipulation is non-atomic, so > how can that even be safe? Hmm. should it be done with mmap_sem being held for write. Will look into this further. But intercepting the page faults inside alloc_pages_vma() for tagging the VMA is okay from over all design perspective ?. Or this should be moved up or down the call chain in the page fault path ? > > If you're going to go down this route, I think you need to be very > careful. We need to ensure that when this flag gets set, it's never set > on VMAs that are "normal" and will only be set on VMAs that were > *explicitly* set up for accessing CDM. That means that you'll need to > make sure that there's no possible way to get a CDM page faulted into a > VMA unless it's via an explicitly assigned policy that would have cause > the VMA to be split from any "normal" one in the system. > > This all makes me really nervous. Got it, will work towards this. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>