[PATCH v2 00/20] mm, hugetlb: remove a hugetlb_instantiation_mutex

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Without a hugetlb_instantiation_mutex, if parallel fault occur, we can
fail to allocate a hugepage, because many threads dequeue a hugepage
to handle a fault of same address. This makes reserved pool shortage
just for a little while and this cause faulting thread to get a SIGBUS
signal, although there are enough hugepages.

To solve this problem, we already have a nice solution, that is,
a hugetlb_instantiation_mutex. This blocks other threads to dive into
a fault handler. This solve the problem clearly, but it introduce
performance degradation, because it serialize all fault handling.
    
Now, I try to remove a hugetlb_instantiation_mutex to get rid of
performance problem reported by Davidlohr Bueso [1].

This patchset consist of 4 parts roughly.

Part 1. (1-6) Random fix and clean-up. Enhancing error handling.
	
	These can be merged into mainline separately.

Part 2. (7-9) Protect region tracking via it's own spinlock, instead of
	the hugetlb_instantiation_mutex.
	
	Breaking dependency on the hugetlb_instantiation_mutex for
	tracking a region is also needed by other approaches like as
	'table mutexes', so these can be merged into mainline separately.

Part 3. (10-13) Clean-up.
	
	IMO, these make code really simple, so these are worth to go into
	mainline separately, regardless success of my approach.

Part 4. (14-20) Remove a hugetlb_instantiation_mutex.
	
	Almost patches are just for clean-up to error handling path.
	In patch 19, retry approach is implemented that if faulted thread
	failed to allocate a hugepage, it continue to run a fault handler
	until there is no concurrent thread having a hugepage. This causes
	threads who want to get a last hugepage to be serialized, so
	threads don't get a SIGBUS if enough hugepage exist.
	In patch 20, remove a hugetlb_instantiation_mutex.

These patches are based on my previous patchset [2] which is now on mmotm.
In my compile testing, [2] and this patchset can be applied to
v3.11-rc4 cleanly, but, I do running test of this patchset on top of v3.10 :)

With applying these, I passed a libhugetlbfs test suite clearly which
have allocation-instantiation race test cases.

If there is a something I should consider, please let me know!
Thanks.

* Changes in v2
- Re-order patches to clear it's relationship
- sleepable object allocation(kmalloc) without holding a spinlock
	(Pointed by Hillf)
- Remove vma_has_reserves, instead of vma_needs_reservation.
	(Suggest by Aneesh and Naoya)
- Change a way of returning a hugepage back to reserved pool
	(Suggedt by Naoya)


[1] http://lwn.net/Articles/558863/ 
	"[PATCH] mm/hugetlb: per-vma instantiation mutexes"
[2] https://lkml.org/lkml/2013/7/22/96
	"[PATCH v2 00/10] mm, hugetlb: clean-up and possible bug fix"

Joonsoo Kim (20):
  mm, hugetlb: protect reserved pages when soft offlining a hugepage
  mm, hugetlb: change variable name reservations to resv
  mm, hugetlb: fix subpool accounting handling
  mm, hugetlb: remove useless check about mapping type
  mm, hugetlb: grab a page_table_lock after page_cache_release
  mm, hugetlb: return a reserved page to a reserved pool if failed
  mm, hugetlb: unify region structure handling
  mm, hugetlb: region manipulation functions take resv_map rather
    list_head
  mm, hugetlb: protect region tracking via newly introduced resv_map
    lock
  mm, hugetlb: remove resv_map_put()
  mm, hugetlb: make vma_resv_map() works for all mapping type
  mm, hugetlb: remove vma_has_reserves()
  mm, hugetlb: mm, hugetlb: unify chg and avoid_reserve to use_reserve
  mm, hugetlb: call vma_needs_reservation before entering
    alloc_huge_page()
  mm, hugetlb: remove a check for return value of alloc_huge_page()
  mm, hugetlb: move down outside_reserve check
  mm, hugetlb: move up anon_vma_prepare()
  mm, hugetlb: clean-up error handling in hugetlb_cow()
  mm, hugetlb: retry if failed to allocate and there is concurrent user
  mm, hugetlb: remove a hugetlb_instantiation_mutex

 fs/hugetlbfs/inode.c    |   16 +-
 include/linux/hugetlb.h |   11 ++
 mm/hugetlb.c            |  419 +++++++++++++++++++++++++----------------------
 3 files changed, 250 insertions(+), 196 deletions(-)

-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]