On Mon, Dec 16, 2024 at 11:24:16AM -0800, Suren Baghdasaryan wrote: > vma_start_read() can temporarily raise vm_refcnt of a write-locked and > detached vma: > > // vm_refcnt==1 (attached) > vma_start_write() > vma->vm_lock_seq = mm->mm_lock_seq > > vma_start_read() > vm_refcnt++; // vm_refcnt==2 > > vma_mark_detached() > vm_refcnt--; // vm_refcnt==1 > > // vma is detached but vm_refcnt!=0 temporarily > > if (vma->vm_lock_seq == mm->mm_lock_seq) > vma_refcount_put() > vm_refcnt--; // vm_refcnt==0 > > This is currently not a problem when freeing the vma because RCU grace > period should pass before kmem_cache_free(vma) gets called and by that > time vma_start_read() should be done and vm_refcnt is 0. However once > we introduce possibility of vma reuse before RCU grace period is over, > this will become a problem (reused vma might be in non-detached state). > Introduce vma_ensure_detached() for the writer to wait for readers until > they exit vma_start_read(). So aside from the lockdep problem (which I think is fixable), the normal way to fix the above is to make dec_and_test() do the kmem_cache_free(). Then the last user does the free and everything just works.