Re: Splitting the mmap_sem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Dec 12, 2019 at 07:40:02AM -0800, Matthew Wilcox wrote:
> > > We currently only have one ->map_pages() callback, and it's
> > > filemap_map_pages().  It only needs to sleep in one place -- to allocate
> > > a PTE table.  I think that can be allocated ahead of time if needed.
> > 
> > No, filemap_map_pages() doesn't sleep. It cannot. Whole body of the
> > function is under rcu_read_lock(). It uses pre-allocated page table.
> > See do_fault_around().
> 
> Oh, thank you!  That makes the ->map_pages() optimisation already workable
> with no changes.

I've been thinking about this some more, and we have a bit of a tough time
allocating page table entries while holding the RCU read lock.  There's
no GFP flags to the p??_alloc() functions, so we can't specify GFP_NOWAIT.

Option 1: Add 'prealloc_pmd' and 'prealloc_pud' to the vm_fault (to go
with prealloc_pte).  Allocate them before taking the RCU lock to walk
the VMA tree.  This will be a bit of reordering as we currently take
the mmap_sem, walk the VMA tree, then walk the page tables once we know
we have a good VMA.  I don't see a problem with doing that, but others
may differ.

Option 2: Add a memalloc_nowait_save/restore API to go along
with nofs and noio.  That way, we can take the RCU read lock, call
memalloc_nowait_save(), and walk the VMA tree and the page tables in
the current order.  There's an increased chance of memory allocation of
page tables failing, so we'll have to risk that and do a retry with the
reference count held on the VMA if we need to sleep to allocate memory.

Option 3: Variant of 2 where we add GFP flags to the p??_alloc()
functions.

Option 4: Variant of 2 where we make taking the RCU read lock magically
set the nowait bit, or we have the page allocator check the RCU preempt
depth.  I don't particularly like this one, particularly since the
preempt depth is not knowable in most kernel configurations.

Other thoughts on this?




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux