Add an optional array of pages to struct vm_area_struct that can be used find the page backing a VMA. This is useful in cases where the normal mechanisms for finding the page don't work. This array is only inspected if the PTE is special. Splitting a VMA with such an array of pages is trivially done by adjusting vma->pages. The original creator of the VMA must only free the page array once all sub-VMAs are closed (e.g., by ref-counting in vm_ops->open and vm_ops->close). One use case is a Xen PV guest mapping foreign pages into userspace. In a Xen PV guest, the PTEs contain MFNs so get_user_pages() (for example) must do an MFN to PFN (M2P) lookup before it can get the page. For foreign pages (those owned by another guest) the M2P lookup returns the PFN as seen by the foreign guest (which would be completely the wrong page for the local guest). This cannot be fixed up improving the M2P lookup since one MFN may be mapped onto two or more pages so getting the right page is impossible given just the MFN. Signed-off-by: David Vrabel <david.vrabel@xxxxxxxxxx> --- include/linux/mm_types.h | 8 ++++++++ mm/memory.c | 2 ++ mm/mmap.c | 12 +++++++++++- 3 files changed, 21 insertions(+), 1 deletion(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 6d34aa2..4f34609 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -309,6 +309,14 @@ struct vm_area_struct { #ifdef CONFIG_NUMA struct mempolicy *vm_policy; /* NUMA policy for the VMA */ #endif + /* + * Array of pages to override the default vm_normal_page() + * result iff the PTE is special. + * + * The memory for this should be refcounted in vm_ops->open + * and vm_ops->close. + */ + struct page **pages; }; struct core_thread { diff --git a/mm/memory.c b/mm/memory.c index ca920d1..98520f6 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -754,6 +754,8 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, if (HAVE_PTE_SPECIAL) { if (likely(!pte_special(pte))) goto check_pfn; + if (vma->pages) + return vma->pages[(addr - vma->vm_start) >> PAGE_SHIFT]; if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP)) return NULL; if (!is_zero_pfn(pfn)) diff --git a/mm/mmap.c b/mm/mmap.c index 7b36aa7..504dc5c 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2448,6 +2448,7 @@ static int __split_vma(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, int new_below) { struct vm_area_struct *new; + unsigned long delta; int err = -ENOMEM; if (is_vm_hugetlb_page(vma) && (addr & @@ -2463,11 +2464,20 @@ static int __split_vma(struct mm_struct *mm, struct vm_area_struct *vma, INIT_LIST_HEAD(&new->anon_vma_chain); + delta = (addr - vma->vm_start) >> PAGE_SHIFT; + if (new_below) new->vm_end = addr; else { new->vm_start = addr; - new->vm_pgoff += ((addr - vma->vm_start) >> PAGE_SHIFT); + new->vm_pgoff += delta; + } + + if (vma->pages) { + if (new_below) + vma->pages += delta; + else + new->pages += delta; } err = vma_dup_policy(vma, new); -- 1.7.10.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>