[PATCH/RFC 10/14] Shared Policy: Add hugepage shmem policy vm_ops

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Shared Policy Infrastructure - Add hugepage shmem policy vm_ops

Hugetlb shmem segments have always had a shared policy structure
in their 'info' struct.  With this series, like all file mappings
on a CONFIG_NUMA kernel, the segments' address_space structures
now have a pointer to a dynamically allocated shared_policy
struct.  The shared policy struct will only be allocated when
a shared policy is installed.  However, currently, the hugetlbfs
vm_operations do not support mempolicy set/get ops to install
and lookup shared policies.

This patch hooks up the hugepage shmem segment's
{set|get}_policy vm_ops so that shmem segments created with
the SHM_HUGETLB flag will install policies specified via the
mbind() syscall into the shared policy of the shared segment.
This capability is possible now that hugetlb pages are faulted
in on demand.

Restore the shmem_{set|get}_policy prototypes to mm.h--removed
back ~23-rc1-mm2 :-(.

The shared policy infrastructure maintains memory policies on
"base page size" ranges.  To ensure that policies installed on
a hugetlb shmem segment cover entire huge pages, this patch
enhances do_mbind() to enforce huge page alignment if the policy
range starts within a hugetlb segment.  The enforcement is down
in check_range() because we need the vma to determine whether or
not the range starts in a hugetlb segment.

	We could just silently round the start address down to
	a hugepage alignment.  This would be safe and, some might
	think, convenient for the application programmer, but it
	is inconsistent with the treatement of base page ranges
	which MUST be page aligned.

Set VMPOL_F_NOSPLIT in hugetlbfs_file_mmap() to prevent splitting
of hugetlbfs vmas when applying mempolicy to subset range of the
segment.

This patch depends on the numa_maps fixes and related shared
policy infrastructure clean up earlier in the series to prevent hangs
when displaying [via cat] the numa_maps of a task that has attached a
huge page shmem segment.


Signed-off-by: Lee Schermerhorn <lee.schermerhorn@xxxxxx>

 Documentation/vm/numa_memory_policy.txt |   16 +++++++++-------
 fs/hugetlbfs/inode.c                    |    1 +
 include/linux/mm.h                      |    6 ++++++
 mm/hugetlb.c                            |    4 ++++
 mm/mempolicy.c                          |   19 +++++++++++++++++--
 mm/shmem.c                              |   20 ++++++++++++++++++--
 6 files changed, 55 insertions(+), 11 deletions(-)

Index: linux-2.6.36-mmotm-101103-1217/mm/hugetlb.c
===================================================================
--- linux-2.6.36-mmotm-101103-1217.orig/mm/hugetlb.c
+++ linux-2.6.36-mmotm-101103-1217/mm/hugetlb.c
@@ -2143,6 +2143,10 @@ const struct vm_operations_struct hugetl
 	.fault = hugetlb_vm_op_fault,
 	.open = hugetlb_vm_op_open,
 	.close = hugetlb_vm_op_close,
+#ifdef CONFIG_NUMA
+	.set_policy	= shmem_set_policy,
+	.get_policy	= shmem_get_policy,
+#endif
 };
 
 static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page,
Index: linux-2.6.36-mmotm-101103-1217/mm/mempolicy.c
===================================================================
--- linux-2.6.36-mmotm-101103-1217.orig/mm/mempolicy.c
+++ linux-2.6.36-mmotm-101103-1217/mm/mempolicy.c
@@ -581,6 +581,15 @@ check_range(struct mm_struct *mm, unsign
 	first = find_vma(mm, start);
 	if (!first)
 		return ERR_PTR(-EFAULT);
+
+	/*
+	 * need vma for hugetlb check
+	 */
+	if (is_vm_hugetlb_page(first)) {
+		if (start & ~HPAGE_MASK)
+			return ERR_PTR(-EINVAL);
+	}
+
 	prev = NULL;
 	for (vma = first; vma && vma->vm_start < end; vma = vma->vm_next) {
 		if (!(flags & MPOL_MF_DISCONTIG_OK)) {
@@ -589,8 +598,14 @@ check_range(struct mm_struct *mm, unsign
 			if (prev && prev->vm_end < vma->vm_start)
 				return ERR_PTR(-EFAULT);
 		}
-		if (!is_vm_hugetlb_page(vma) &&
-		    ((flags & MPOL_MF_STRICT) ||
+		if (is_vm_hugetlb_page(vma)) {
+			/*
+			 * round end up to hugepage alignment if
+			 * it falls in a hugetlb vma.
+			 */
+			if (end < vma->vm_end)
+				end = (end + HPAGE_MASK) & HPAGE_MASK;
+		} else if (((flags & MPOL_MF_STRICT) ||
 		     ((flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) &&
 				vma_migratable(vma)))) {
 			unsigned long endvma = vma->vm_end;
Index: linux-2.6.36-mmotm-101103-1217/include/linux/mm.h
===================================================================
--- linux-2.6.36-mmotm-101103-1217.orig/include/linux/mm.h
+++ linux-2.6.36-mmotm-101103-1217/include/linux/mm.h
@@ -748,6 +748,10 @@ extern void show_free_areas(void);
 int shmem_lock(struct file *file, int lock, struct user_struct *user);
 struct file *shmem_file_setup(const char *name, loff_t size, unsigned long flags);
 int shmem_zero_setup(struct vm_area_struct *);
+int shmem_set_policy(struct vm_area_struct *vma,
+	unsigned long start, unsigned long end, struct mempolicy *new);
+struct mempolicy *shmem_get_policy(struct vm_area_struct *vma,
+					unsigned long addr);
 
 #ifndef CONFIG_MMU
 extern unsigned long shmem_get_unmapped_area(struct file *file,
@@ -1251,6 +1255,8 @@ static inline pgoff_t vma_mpol_pgoff(str
 	return ((addr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
 }
 
+// TODO:  is this OK for huge pages?  Or do I need inverse of
+// vma_huge_mpol_offset?
 static inline unsigned long vma_mpol_addr(struct vm_area_struct *vma,
 						pgoff_t pgoff)
 {
Index: linux-2.6.36-mmotm-101103-1217/mm/shmem.c
===================================================================
--- linux-2.6.36-mmotm-101103-1217.orig/mm/shmem.c
+++ linux-2.6.36-mmotm-101103-1217/mm/shmem.c
@@ -1505,7 +1505,7 @@ static int shmem_fault(struct vm_area_st
 }
 
 #ifdef CONFIG_NUMA
-static int shmem_set_policy(struct vm_area_struct *vma, unsigned long start,
+int shmem_set_policy(struct vm_area_struct *vma, unsigned long start,
 			unsigned long end, struct mempolicy *new)
 {
 	struct address_space *mapping = vma->vm_file->f_mapping;
@@ -1520,7 +1520,7 @@ static int shmem_set_policy(struct vm_ar
 					(end - start) >> PAGE_SHIFT, new);
 }
 
-static struct mempolicy *shmem_get_policy(struct vm_area_struct *vma,
+struct mempolicy *shmem_get_policy(struct vm_area_struct *vma,
 					  unsigned long addr)
 {
 	struct address_space *mapping = vma->vm_file->f_mapping;
@@ -1530,6 +1530,7 @@ static struct mempolicy *shmem_get_polic
 		return NULL;	/* == default policy */
 	return mpol_shared_policy_lookup(sp, vma_mpol_pgoff(vma, addr));
 }
+#define HAVE_SHMEM_XET_POLICY
 #endif
 
 int shmem_lock(struct file *file, int lock, struct user_struct *user)
@@ -2700,6 +2701,21 @@ out:
 
 #endif /* CONFIG_SHMEM */
 
+#ifndef HAVE_SHMEM_XET_POLICY
+int shmem_set_policy(struct vm_area_struct *vma,
+				   struct mempolicy *new)
+{
+	return 0;
+}
+
+struct mempolicy *shmem_get_policy(struct vm_area_struct *vma,
+						 unsigned long addr)
+{
+	return NULL;
+}
+#endif
+
+
 /* common code */
 
 /**
Index: linux-2.6.36-mmotm-101103-1217/fs/hugetlbfs/inode.c
===================================================================
--- linux-2.6.36-mmotm-101103-1217.orig/fs/hugetlbfs/inode.c
+++ linux-2.6.36-mmotm-101103-1217/fs/hugetlbfs/inode.c
@@ -93,6 +93,7 @@ static int hugetlbfs_file_mmap(struct fi
 	 */
 	vma->vm_flags |= VM_HUGETLB | VM_RESERVED;
 	vma->vm_ops = &hugetlb_vm_ops;
+	mpol_set_vma_nosplit(vma);
 
 	if (vma->vm_pgoff & ~(huge_page_mask(h) >> PAGE_SHIFT))
 		return -EINVAL;
Index: linux-2.6.36-mmotm-101103-1217/Documentation/vm/numa_memory_policy.txt
===================================================================
--- linux-2.6.36-mmotm-101103-1217.orig/Documentation/vm/numa_memory_policy.txt
+++ linux-2.6.36-mmotm-101103-1217/Documentation/vm/numa_memory_policy.txt
@@ -114,13 +114,15 @@ most general to most specific:
     by any task, will obey the shared policy.
 
 	As of 2.6.28, only shared memory segments, created by shmget() or
-	mmap(MAP_ANONYMOUS|MAP_SHARED), support shared policy.  When shared
-	policy support was added to Linux, the associated data structures were
-	added to hugetlbfs shmem segments.  At the time, hugetlbfs did not
-	support allocation at fault time--a.k.a lazy allocation--so hugetlbfs
-	shmem segments were never "hooked up" to the shared policy support.
-	Although hugetlbfs segments now support lazy allocation, their support
-	for shared policy has not been completed.
+	mmap(MAP_ANONYMOUS|MAP_SHARED), support shared policy.  Prior to
+	2.6.XX, shared segments backed by huge pages did not support shared
+	policy.  In fact, different tasks could install different policies
+	for the same ranges of a shared huge page segment.  The policy of
+	any given page was determined by which task touched it first--always
+	the case for local allocation.  As of 2.6.XX, Linux supports shared
+	policies on huge pages shared segments, just as for regular sized
+	pages.  To preserve existing behavior for applications that might
+	care, this new behavior must be enabled on a per-cpuset basis.
 
 	As mentioned above [re: VMA policies], allocations of page cache
 	pages for regular files mmap()ed with MAP_SHARED ignore any VMA
--
To unsubscribe from this list: send the line "unsubscribe linux-numa" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]     [Devices]

  Powered by Linux