On Thu, Jul 18, 2013 at 05:42:35PM +0900, Joonsoo Kim wrote: > On Wed, Jul 17, 2013 at 12:50:25PM -0700, Davidlohr Bueso wrote: > > From: David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> > > > > At present, the page fault path for hugepages is serialized by a > > single mutex. This is used to avoid spurious out-of-memory conditions > > when the hugepage pool is fully utilized (two processes or threads can > > race to instantiate the same mapping with the last hugepage from the > > pool, the race loser returning VM_FAULT_OOM). This problem is > > specific to hugepages, because it is normal to want to use every > > single hugepage in the system - with normal pages we simply assume > > there will always be a few spare pages which can be used temporarily > > until the race is resolved. > > > > Unfortunately this serialization also means that clearing of hugepages > > cannot be parallelized across multiple CPUs, which can lead to very > > long process startup times when using large numbers of hugepages. > > > > This patch improves the situation by replacing the single mutex with a > > table of mutexes, selected based on a hash, which allows us to know > > which page in the file we're instantiating. For shared mappings, the > > hash key is selected based on the address space and file offset being faulted. > > Similarly, for private mappings, the mm and virtual address are used. > > > > Hello. > > With this table mutex, we cannot protect region tracking structure. > See below comment. > > /* > * Region tracking -- allows tracking of reservations and instantiated pages > * across the pages in a mapping. > * > * The region data structures are protected by a combination of the mmap_sem > * and the hugetlb_instantion_mutex. To access or modify a region the caller > * must either hold the mmap_sem for write, or the mmap_sem for read and > * the hugetlb_instantiation mutex: > * > * down_write(&mm->mmap_sem); > * or > * down_read(&mm->mmap_sem); > * mutex_lock(&hugetlb_instantiation_mutex); > */ Ugh. Who the hell added that. I guess you'll need to split of another mutex for that purpose, afaict there should be no interaction with the actual, intended purpose of the instantiation mutex. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
Attachment:
pgpV10ZP_TiPW.pgp
Description: PGP signature