Adding Mel and Rik to cc: Benjamin Herrenschmidt <benh@xxxxxxxxxxx> writes: > On Mon, 2013-11-18 at 14:58 +0530, Aneesh Kumar K.V wrote: >> From: "Aneesh Kumar K.V" <aneesh.kumar@xxxxxxxxxxxxxxxxxx> >> >> change_prot_numa should work even if _PAGE_NUMA != _PAGE_PROTNONE. >> On archs like ppc64 that don't use _PAGE_PROTNONE and also have >> a separate page table outside linux pagetable, we just need to >> make sure that when calling change_prot_numa we flush the >> hardware page table entry so that next page access result in a numa >> fault. > > That patch doesn't look right... > > You are essentially making change_prot_numa() do whatever it does (which > I don't completely understand) *for all architectures* now, whether they > have CONFIG_ARCH_USES_NUMA_PROT_NONE or not ... So because you want that > behaviour on powerpc book3s64, you change everybody. > > Is that correct ? Yes. > > Also what exactly is that doing, can you explain ? From what I can see, > it calls back into the core of mprotect to change the protection to > vma->vm_page_prot, which I would have expected is already the protection > there, with the added "prot_numa" flag passed down. it set the _PAGE_NUMA bit. Now we also want to make sure that when we set _PAGE_NUMA, we would get a pagefault on that so that we can track that fault as a numa fault. To ensure that, we had the below BUILD_BUG BUILD_BUG_ON(_PAGE_NUMA != _PAGE_PROTNONE); But other than that the function doesn't really have any dependency on _PAGE_PROTNONE. The only requirement is when we set _PAGE_NUMA, the architecture should do enough to ensure that we get a page fault. Now on ppc64 we does that by clearlying hpte entry and also clearing _PAGE_PRESENT. Since we have _PAGE_PRESENT cleared hash_page will return 1 and we get to page fault handler. > > Your changeset comment says "On archs like ppc64 [...] we just need to > make sure that when calling change_prot_numa we flush the > hardware page table entry so that next page access result in a numa > fault." > > But change_prot_numa() does a lot more than that ... it does > pte_mknuma(), do we need it ? I assume we do or we wouldn't have added > that PTE bit to begin with... > > Now it *might* be allright and it might be that no other architecture > cares anyway etc... but I need at least some mm folks to ack on that > patch before I can take it because it *will* change behaviour of other > architectures. > Ok, I can move the changes below #ifdef CONFIG_NUMA_BALANCING ? We call change_prot_numa from task_numa_work and queue_pages_range(). The later may be an issue. So doing the below will help ? -#ifdef CONFIG_ARCH_USES_NUMA_PROT_NONE +#ifdef CONFIG_NUMA_BALANCING -aneesh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>