On Sun, Jul 12, 2015 at 09:20:59PM -0700, Mike Kravetz wrote: > fallocate hole punch will want to remove a specific range of > pages. When pages are removed, their associated entries in > the region/reserve map will also be removed. This will break > an assumption in the region_chg/region_add calling sequence. > If a new region descriptor must be allocated, it is done as > part of the region_chg processing. In this way, region_add > can not fail because it does not need to attempt an allocation. > > To prepare for fallocate hole punch, create a "cache" of > descriptors that can be used by region_add if necessary. > region_chg will ensure there are sufficient entries in the > cache. It will be necessary to track the number of in progress > add operations to know a sufficient number of descriptors > reside in the cache. A new routine region_abort is added to > adjust this in progress count when add operations are aborted. > vma_abort_reservation is also added for callers creating > reservations with vma_needs_reservation/vma_commit_reservation. > > Signed-off-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> > --- > include/linux/hugetlb.h | 3 + > mm/hugetlb.c | 169 ++++++++++++++++++++++++++++++++++++++++++------ > 2 files changed, 153 insertions(+), 19 deletions(-) > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index d891f94..667cf44 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -35,6 +35,9 @@ struct resv_map { > struct kref refs; > spinlock_t lock; > struct list_head regions; > + long adds_in_progress; > + struct list_head rgn_cache; > + long rgn_cache_count; > }; > extern struct resv_map *resv_map_alloc(void); > void resv_map_release(struct kref *ref); > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index a8c3087..241d16d 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -240,11 +240,14 @@ struct file_region { > > /* > * Add the huge page range represented by [f, t) to the reserve > - * map. Existing regions will be expanded to accommodate the > - * specified range. We know only existing regions need to be > - * expanded, because region_add is only called after region_chg > - * with the same range. If a new file_region structure must > - * be allocated, it is done in region_chg. > + * map. In the normal case, existing regions will be expanded > + * to accommodate the specified range. Sufficient regions should > + * exist for expansion due to the previous call to region_chg > + * with the same range. However, it is possible that region_del > + * could have been called after region_chg and modifed the map > + * in such a way that no region exists to be expanded. In this > + * case, pull a region descriptor from the cache associated with > + * the map and use that for the new range. > * > * Return the number of new huge pages added to the map. This > * number is greater than or equal to zero. > @@ -261,6 +264,27 @@ static long region_add(struct resv_map *resv, long f, long t) > if (f <= rg->to) > break; > > + if (&rg->link == head || t < rg->from) { > + /* > + * No region exists which can be expanded to include the > + * specified range. Pull a region descriptor from the > + * cache, and use it for this range. > + */ This comment mentions this if-block, not the VM_BUG_ON below, so it had better be put the above if-line. > + VM_BUG_ON(!resv->rgn_cache_count); resv->rgn_cache_count <= 0 might be safer. ... > @@ -3236,11 +3360,14 @@ retry: > * any allocations necessary to record that reservation occur outside > * the spinlock. > */ > - if ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) > + if ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) { > if (vma_needs_reservation(h, vma, address) < 0) { > ret = VM_FAULT_OOM; > goto backout_unlocked; > } > + /* Just decrements count, does not deallocate */ > + vma_abort_reservation(h, vma, address); > + } This is not "abort reservation" operation, but you use "abort reservation" routine, which might confusing and makes future maintenance hard. I think this should be done in a simplified variant of vma_commit_reservation() (maybe just an alias of your vma_abort_reservation()) or fast path in vma_commit_reservation(). Thanks, Naoya Horiguchi > > ptl = huge_pte_lockptr(h, mm, ptep); > spin_lock(ptl); > @@ -3387,6 +3514,8 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, > ret = VM_FAULT_OOM; > goto out_mutex; > } > + /* Just decrements count, does not deallocate */ > + vma_abort_reservation(h, vma, address); > > if (!(vma->vm_flags & VM_MAYSHARE)) > pagecache_page = hugetlbfs_pagecache_page(h, > @@ -3726,6 +3855,8 @@ int hugetlb_reserve_pages(struct inode *inode, > } > return 0; > out_err: > + if (!vma || vma->vm_flags & VM_MAYSHARE) > + region_abort(resv_map, from, to); > if (vma && is_vma_resv_set(vma, HPAGE_RESV_OWNER)) > kref_put(&resv_map->refs, resv_map_release); > return ret; > -- > 2.1.0 > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href