On Mon, Dec 16, 2024 at 1:15 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > On Mon, Dec 16, 2024 at 11:24:13AM -0800, Suren Baghdasaryan wrote: > > FWIW, I find the whole VMA_STATE_{A,DE}TATCHED thing awkward. And I'm bad with naming things, so any better suggestions are welcome. Are you suggesting to drop VMA_STATE_{A,DE}TATCHED nomenclature and use 0/1 values directly? > perhaps s/VMA_STATE_LOCKED/VMA_LOCK_OFFSET/ ? Sounds good. I'll change it to VMA_LOCK_OFFSET. > > Also, perhaps: > > #define VMA_REF_LIMIT (VMA_LOCK_OFFSET - 2) Ack. > > > @@ -699,10 +700,27 @@ static inline void vma_numab_state_free(struct vm_area_struct *vma) {} > > #ifdef CONFIG_PER_VMA_LOCK > > static inline void vma_lock_init(struct vm_area_struct *vma) > > { > > - init_rwsem(&vma->vm_lock.lock); > > +#ifdef CONFIG_DEBUG_LOCK_ALLOC > > + static struct lock_class_key lockdep_key; > > + > > + lockdep_init_map(&vma->vmlock_dep_map, "vm_lock", &lockdep_key, 0); > > +#endif > > + refcount_set(&vma->vm_refcnt, VMA_STATE_DETACHED); > > vma->vm_lock_seq = UINT_MAX; > > Depending on how you do the actual allocation (GFP_ZERO) you might want > to avoid that vm_refcount store entirely. I think we could initialize it to 0 in the slab cache constructor and when vma is freed we already ensure it's 0. So, even when reused it will be in the correct 0 state. > > Perhaps instead write: VM_WARN_ON(refcount_read(&vma->vm_refcnt)); Yes, with the above approach that should work. > > > @@ -813,25 +849,42 @@ static inline void vma_assert_write_locked(struct vm_area_struct *vma) > > > > static inline void vma_assert_locked(struct vm_area_struct *vma) > > { > > - if (!rwsem_is_locked(&vma->vm_lock.lock)) > > + if (refcount_read(&vma->vm_refcnt) <= VMA_STATE_ATTACHED) > if (is_vma_detached(vma)) > > > vma_assert_write_locked(vma); > > } > > > > -static inline void vma_mark_attached(struct vm_area_struct *vma) > > +/* > > + * WARNING: to avoid racing with vma_mark_attached(), should be called either > > + * under mmap_write_lock or when the object has been isolated under > > + * mmap_write_lock, ensuring no competing writers. > > + */ > > +static inline bool is_vma_detached(struct vm_area_struct *vma) > > { > > - vma->detached = false; > > + return refcount_read(&vma->vm_refcnt) == VMA_STATE_DETACHED; > return !refcount_read(&vma->vm_refcnt); > > } > > > > -static inline void vma_mark_detached(struct vm_area_struct *vma) > > +static inline void vma_mark_attached(struct vm_area_struct *vma) > > { > > - /* When detaching vma should be write-locked */ > > vma_assert_write_locked(vma); > > - vma->detached = true; > > + > > + if (is_vma_detached(vma)) > > + refcount_set(&vma->vm_refcnt, VMA_STATE_ATTACHED); > > Urgh, so it would be really good to not call this at all them not 0. > I've not tried to untangle the mess, but this is really awkward. Surely > you don't add it to the mas multiple times either. The issue is that when we merge/split/shrink/grow vmas, we skip on marking them detached while modifying them. Therefore from vma_mark_attached() POV it will look like we are attaching an already attached vma. I can try to clean that up if this is really a concern. > > Also: > > refcount_set(&vma->vm_refcnt, 1); > > is so much clearer. Ok, IIUC you are in favour of dropping VMA_STATE_ATTACHED/VMA_STATE_DETACHED. > > That is, should this not live in vma_iter_store*(), right before > mas_store_gfp() ? Currently it's done right *after* mas_store_gfp() but I was debating with myself if it indeed should be *before* insertion into the tree... > > Also, ISTR having to set vm_lock_seq right before it? Yes, vma_mark_attached() requires vma to be write-locked beforehand, hence the above vma_assert_write_locked(). But oftentimes it's locked not right before vma_mark_attached() because some other modification functions also require vma to be write-locked. > > > } > > > > -static inline bool is_vma_detached(struct vm_area_struct *vma) > > +static inline void vma_mark_detached(struct vm_area_struct *vma) > > { > > - return vma->detached; > > + vma_assert_write_locked(vma); > > + > > + if (is_vma_detached(vma)) > > + return; > > Again, this just reads like confusion :/ Surely you don't have the same > with mas_detach? I'll double-check if we ever double-mark vma as detached. Thanks for the review! > > > + > > + /* We are the only writer, so no need to use vma_refcount_put(). */ > > + if (!refcount_dec_and_test(&vma->vm_refcnt)) { > > + /* > > + * Reader must have temporarily raised vm_refcnt but it will > > + * drop it without using the vma since vma is write-locked. > > + */ > > + } > > }