On Fri, Jan 10, 2025 at 8:07 AM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > > On Fri, Jan 10, 2025 at 7:31 AM Vlastimil Babka <vbabka@xxxxxxx> wrote: > > > > On 1/9/25 3:30 AM, Suren Baghdasaryan wrote: > > > To enable SLAB_TYPESAFE_BY_RCU for vma cache we need to ensure that > > > object reuse before RCU grace period is over will be detected by > > > lock_vma_under_rcu(). > > > Current checks are sufficient as long as vma is detached before it is > > > freed. The only place this is not currently happening is in exit_mmap(). > > > Add the missing vma_mark_detached() in exit_mmap(). > > > Another issue which might trick lock_vma_under_rcu() during vma reuse > > > is vm_area_dup(), which copies the entire content of the vma into a new > > > one, overriding new vma's vm_refcnt and temporarily making it appear as > > > attached. This might trick a racing lock_vma_under_rcu() to operate on > > > a reused vma if it found the vma before it got reused. To prevent this > > > situation, we should ensure that vm_refcnt stays at detached state (0) > > > when it is copied and advances to attached state only after it is added > > > into the vma tree. Introduce vma_copy() which preserves new vma's > > > vm_refcnt and use it in vm_area_dup(). Since all vmas are in detached > > > state with no current readers when they are freed, lock_vma_under_rcu() > > > will not be able to take vm_refcnt after vma got detached even if vma > > > is reused. > > > Finally, make vm_area_cachep SLAB_TYPESAFE_BY_RCU. This will facilitate > > > vm_area_struct reuse and will minimize the number of call_rcu() calls. > > > > > > Signed-off-by: Suren Baghdasaryan <surenb@xxxxxxxxxx> > > > > Reviewed-by: Vlastimil Babka <vbabka@xxxxxxx> > > > > You could also drop the reset_refcnt parameter of vma_lock_init() now, > > as the usage in vm_area_dup() should now be just setting 0 over 0. Maybe > > a VM_WARN_ON if it's not 0 already? > > Yeah, that's a good idea. Will do. Ugh, once I made this change, the newly added VM_WARN_ON() immediately triggered because vm_area_dup() does not memset(0) the entire vma and kmem_cache_alloc(vm_area_cachep) does not always return a reused vma. I could add a vm_area_cachep constructor to always initialize vm_refcnt to 0 but that would lead to more changes. I think I'll keep reset_refcnt for now and will add vm_area_cachep constructor as a follow-up optimization after this patchset is merged. > > > And a comment in vm_area_struct definition to consider vma_copy() when > > adding any new field? > > Sure, will add. > > > > > > + /* > > > + * src->shared.rb may be modified concurrently, but the clone > > > + * will be reinitialized. > > > + */ > > > + data_race(memcpy(&dest->shared, &src->shared, sizeof(dest->shared))); > > > > The comment makes it sound as if we didn't need to do it at all? But I > > didn't verify. If we do need it in some cases (i.e. the just allocated > > vma might have garbage from previous lifetime, but src is well defined > > and it's a case where it's not reinitialized afterwards) maybe the > > comment should say? Or if it's either reinitialized later or zeroes at > > src, we could memset() the zeroes instead of memcpying them, etc. > > I see vm_area_dup() being used in dup_mmap() and I think this comment > is about this usage in case the src vma changes from under us. > However, vm_area_dup() is also used when we simply duplicate an > existing vma while holding an mmap_write_lock, like in __split_vma(). > In these cases there is no possibility of a race and copied value > should hold. Maybe I should amend this comment like this: > > /* > * src->shared.rb may be modified concurrently when called from dup_mmap(), > * but the clone will reinitialize it. > */ > > WDYT? > > > > >