On Wed, Jan 8, 2025 at 11:00 AM Vlastimil Babka <vbabka@xxxxxxx> wrote: > > On 1/8/25 19:44, Suren Baghdasaryan wrote: > > On Wed, Jan 8, 2025 at 10:21 AM Vlastimil Babka <vbabka@xxxxxxx> wrote: > >> > >> On 12/26/24 18:07, Suren Baghdasaryan wrote: > >> > To enable SLAB_TYPESAFE_BY_RCU for vma cache we need to ensure that > >> > object reuse before RCU grace period is over will be detected by > >> > lock_vma_under_rcu(). Current checks are sufficient as long as vma > >> > is detached before it is freed. Implement this guarantee by calling > >> > vma_ensure_detached() before vma is freed and make vm_area_cachep > >> > SLAB_TYPESAFE_BY_RCU. This will facilitate vm_area_struct reuse and > >> > will minimize the number of call_rcu() calls. > >> > > >> > Signed-off-by: Suren Baghdasaryan <surenb@xxxxxxxxxx> > >> > >> I've noticed vm_area_dup() went back to the approach of "we memcpy > >> everything including vma_lock and detached (now the vm_refcnt) followed by a > >> vma_init_lock(..., true) that does refcount_set(&vma->vm_refcnt, 0); > >> Is that now safe against a racing lock_vma_under_rcu()? I think it's not? > > > > I think it's safe because vma created by vm_area_dup() is not in the > > vma tree yet, so lock_vma_under_rcu() does not see it until it's added > > into the tree. Note also that at the time when the new vma gets added > > into the tree, the vma has to be write-locked > > (vma_iter_store()->vma_mark_attached()->vma_assert_write_locked()). > > So, lock_vma_under_rcu() won't use the new vma even after it's added > > into the tree until we unlock the vma. > > > What about something like this, where vma starts out as attached as thus > reachable: Huh, very clever sequence. > > A: B: C: > lock_vma_under_rcu() > vma = mas_walk() > vma_start_read() > vm_lock_seq == mm->mm_lock_seq.sequence > vma_start_write > vma detached and freed > > vm_area_dup() > - vma reallocated > - memcpy() copies non-zero refcnt from orig > > __refcount_inc_not_zero_limited() succeeds > > vma_init_lock(); > refcount_set(&vma->vm_refcnt, 0); > > - vm_lock_seq validation fails (could it even succeed?) It can succeed if task C drops the vma write-lock before A validates vm_lock_seq. > vma_refcount_put(vma); > __refcount_dec_and_test makes refcount -1 Yeah, I guess I will have to keep vm_refcnt at 0 across reuse, so memcpy() in vm_area_dup() should be replaced. I'll make the changes. Thanks for analyzing this, Vlastimil! > > >