Re: [RFC 4/4] Change mmap_sem to range lock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 21/04/2017 01:36, Andi Kleen wrote:
> Laurent Dufour <ldufour@xxxxxxxxxxxxxxxxxx> writes:
> 
>> [resent this patch which seems to have not reached the mailing lists]
>>
>> Change the mmap_sem to a range lock to allow finer grain locking on
>> the memory layout of a task.
>>
>> This patch rename mmap_sem into mmap_rw_tree to avoid confusion and
>> replace any locking (read or write) by complete range locking.  So
>> there is no functional change except in the way the underlying locking
>> is achieved.
>>
>> Currently, this patch only supports x86 and PowerPc architectures,
>> furthermore it should break the build of any others.
> 
> Thanks for working on this.
> 
> However as commented before I think the first step to make progress here
> is a description of everything mmap_sem protects.

Hi Andy,

I looked for the write mmap_sem locking in x86 and ppc64 architectures,
here is what I found:

mmap_sem protects
 vdso mapping
 VMA layout changes
 VMA cache
 Page protection/layout
 Changes to mmu notifier chain
 mmap_sem is used to serialize khugepaged's access
 mmap_sem is used to serialize ksm's access
 protection keys (pkey_alloc()...)

Calls to
 get_unmap_area()
 do_mmap()
 do_mmap_pgoff()
 do_munmap()
 get_user_pages()
 put_page()
 set_page_dirty_lock()
 find_vma()
 find_vma_intersection()
 alloc_empty_pages()
 insert_vm_struct()
 get_mm_rss()
 uprobe_consumer->filter() (currently only uprobe_perf_filter())
 _install_special_mapping()
 pmdp_collapse_flush()
 do_swap_page()
 do_brk()
 __split_vma()
 mremap_to()
 vma_to_resize()
 vma_adjust()

MM fields
   pinned_vm
   stack_vm
   total_vm
   locked_vm
   start_stack
   start_code
   end_code
   start_data
   start_brk
   bd_addr
   mm_users
   core_state
   context.vdso_*
   def_flags
   mmu_notifier_mm

VMA fields
    vm_private_data
    vm_flags
    vm_page_prot
    vm_file
    vm_pgoff
    vm_policy


Userfaultfd has not been looked in details yet.
dup_mmap() locks the oldmm in write mode when copying it, is it necessary ?

> Surely the init full case could be done shorter with some wrapper
> that combines the init_full and lock operation?

Yes that doable, I wrote this like that, because the range should be
initialized based on the on going operation, so having an explicit init
operation is making this more explicit.

> Then it would be likely a simple search'n'replace to move the
> whole tree in one atomic step to the new wrappers.
> Initially they could be just defined to use rwsems too to
> not change anything at all.
> 
> It would be a good idea to merge such a patch as quickly
> as possible beause it will be a nightmare to maintain
> longer term.
> 
> Then you could add a config to use a range lock through
> the wrappers.

I agree, I should try a way to make that patch activated through a
CONFIG_value, but there is a the additional range value that make it
more complex to achieve. I'll try to figure out a way to do that.

> Then after that you could add real ranges step by step,
> after doing the proper analysis.

That's the biggest part of the job.
I'm also wondering if a dedicated lock/sem should be introduced to
protect the VMA cache and the VMA list, since the range itself will not
protect against change while walking the VMA list.

Please advise.

Cheers,
Laurent.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]
  Powered by Linux