Re: [PATCH 5/5] hugetlbfs: Limit wait time when trying to share huge PMD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 11 Sep 2019, Matthew Wilcox wrote:

On Wed, Sep 11, 2019 at 08:26:52PM -0700, Mike Kravetz wrote:
All this got me wondering if we really need to take i_mmap_rwsem in write
mode here.  We are not changing the tree, only traversing it looking for
a suitable vma.

Unless I am missing something, the hugetlb code only ever takes the semaphore
in write mode; never read.  Could this have been the result of changing the
tree semaphore to read/write?  Instead of analyzing all the code, the easiest
and safest thing would have been to take all accesses in write mode.

I was wondering the same thing.  It was changed here:

commit 83cde9e8ba95d180eaefefe834958fbf7008cf39
Author: Davidlohr Bueso <dave@xxxxxxxxxxxx>
Date:   Fri Dec 12 16:54:21 2014 -0800

   mm: use new helper functions around the i_mmap_mutex

   Convert all open coded mutex_lock/unlock calls to the
   i_mmap_[lock/unlock]_write() helpers.

and a subsequent patch said:

   This conversion is straightforward.  For now, all users take the write
   lock.

There were subsequent patches which changed a few places
c8475d144abb1e62958cc5ec281d2a9e161c1946
1acf2e040721564d579297646862b8ea3dd4511b
d28eb9c861f41aa2af4cfcc5eeeddff42b13d31e
874bfcaf79e39135cd31e1cfc9265cf5222d1ec3
3dec0ba0be6a532cac949e02b853021bf6d57dad

but I don't know why this one wasn't changed.

I cannot recall why huge_pmd_share() was not changed along with the other
callers that don't modify the interval tree. By looking at the function,
I agree that this could be shared, in fact this lock is much less involved
than it's anon_vma counterpart, last I checked (perhaps with the exception
of take_rmap_locks().


(I was also wondering about caching a potentially sharable page table
in the address_space to avoid having to walk the VMA tree at all if that
one happened to be sharable).

I also think that the right solution is within the mm instead of adding
a new api to rwsem and the extra complexity/overhead to osq _just_ for this
case. We've managed to not need timeout extensions in our locking primitives
thus far, which is a good thing imo.

Thanks,
Davidlohr



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux