On Thu, Dec 12, 2024 at 6:17 AM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > > On Thu, Dec 12, 2024 at 1:17 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > > > On Wed, Dec 11, 2024 at 07:01:16PM -0800, Suren Baghdasaryan wrote: > > > > > > > > I think your proposal should work. Let me try to code it and see if > > > > > > something breaks. > > > > > > Ok, I tried it out and things are a bit more complex: > > > 1. We should allow write-locking a detached VMA, IOW vma_start_write() > > > can be called when vm_refcnt is 0. > > > > This sounds dodgy, refcnt being zero basically means the object is dead > > and you shouldn't be touching it no more. Where does this happen and > > why? > > > > Notably, it being 0 means it is no longer in the mas tree and can't be > > found anymore. > > It happens when a newly created vma that was not yet attached > (vma->vm_refcnt = 0) is write-locked before being added into the vma > tree. For example: > mmap() > mmap_write_lock() > vma = vm_area_alloc() // vma->vm_refcnt = 0 (detached) > //vma attributes are initialized > vma_start_write() // write 0x8000 0001 into vma->vm_refcnt > mas_store_gfp() > vma_mark_attached() > mmap_write_lock() // vma_end_write_all() s/mmap_write_lock()/mmap_write_unlock() > > In this scenario, we write-lock the VMA before adding it into the tree > to prevent readers (pagefaults) from using it until we drop the > mmap_write_lock(). In your proposal, the first thing vma_start_write() > does is add(0x8000'0001) and that will trigger a warning. > For now instead of add(0x8000'0001) I can play this game to avoid the warning: > > if (refcount_inc_not_zero(&vma->vm_refcnt)) > refcount_add(0x80000000, &vma->vm_refcnt); > else > refcount_set(&vma->vm_refcnt, 0x80000001); > > this refcount_set() works because vma with vm_refcnt==0 could not be > found by readers. I'm not sure this will still work when we add > TYPESAFE_BY_RCU and introduce vma reuse possibility. > > > > > > 2. Adding 0x80000000 saturates refcnt, so I have to use a lower bit > > > 0x40000000 to denote writers. > > > > I'm confused, what? We're talking about atomic_t, right? > > I thought you suggested using refcount_t. According to > https://elixir.bootlin.com/linux/v6.13-rc2/source/include/linux/refcount.h#L22 > valid values would be [0..0x7fff_ffff] and 0x80000000 is outside of > that range. What am I missing? > > > > > > 3. Currently vma_mark_attached() can be called on an already attached > > > VMA. With vma->detached being a separate attribute that works fine but > > > when we combine it with the vm_lock things break (extra attach would > > > leak into lock count). I'll see if I can catch all the cases when we > > > do this and clean them up (not call vma_mark_attached() when not > > > necessary). > > > > Right, I hadn't looked at that thing in detail, that sounds like it > > needs a wee cleanup like you suggest. > > Yes, I'll embark on that today. Will see how much of a problem that is.