On Mon, Feb 08, 2021 at 01:26:43PM +0000, Matthew Wilcox wrote: > TLDR: I think we're going to need to call synchronize_rcu() in munmap(), > and I know this can be quite expensive. Is this a reasonable price to > pay for avoiding taking the mmap_sem for page faults? In my experience we have been ripping out synchronize_rcu() in places that intersect with userspace, it can be very slow and if something does more than one iteration things get bad. Based on that mmunmap seems like a poor place to put synchronize_rcu() > Next problem: /proc/$pid/smaps calls walk_page_vma() which starts out by > saying: > mmap_assert_locked(walk.mm); > which made me realise that smaps is also going to walk the page > tables. Aren't the presense/absence of the page table levels themselves managed under the page table locks? I thought that unused levels were wiped in some lazy fasion after the TLB flush based on being empty? There is a state graph of allowed page entry transitions under the read side of the mmap sem, and allocated -> freed table is not allowed (freed -> allocated is OK though, IIRC). I think this has nothing to do with the mmap_sem acting as a VMA lock, it feels like some extra behavior. Jason