Re: [PATCH 3/4] mm: replace rw_semaphore with atomic_t in vma_lock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Dec 12, 2024 at 6:17 AM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:
>
> On Thu, Dec 12, 2024 at 1:17 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > On Wed, Dec 11, 2024 at 07:01:16PM -0800, Suren Baghdasaryan wrote:
> >
> > > > > > I think your proposal should work. Let me try to code it and see if
> > > > > > something breaks.
> > >
> > > Ok, I tried it out and things are a bit more complex:
> > > 1. We should allow write-locking a detached VMA, IOW vma_start_write()
> > > can be called when vm_refcnt is 0.
> >
> > This sounds dodgy, refcnt being zero basically means the object is dead
> > and you shouldn't be touching it no more. Where does this happen and
> > why?
> >
> > Notably, it being 0 means it is no longer in the mas tree and can't be
> > found anymore.
>
> It happens when a newly created vma that was not yet attached
> (vma->vm_refcnt = 0) is write-locked before being added into the vma
> tree. For example:
> mmap()
>   mmap_write_lock()
>   vma = vm_area_alloc() // vma->vm_refcnt = 0 (detached)
>   //vma attributes are initialized
>   vma_start_write() // write 0x8000 0001 into vma->vm_refcnt
>   mas_store_gfp()
>   vma_mark_attached()
>   mmap_write_lock() // vma_end_write_all()

s/mmap_write_lock()/mmap_write_unlock()
>
> In this scenario, we write-lock the VMA before adding it into the tree
> to prevent readers (pagefaults) from using it until we drop the
> mmap_write_lock(). In your proposal, the first thing vma_start_write()
> does is add(0x8000'0001) and that will trigger a warning.
> For now instead of add(0x8000'0001) I can play this game to avoid the warning:
>
> if (refcount_inc_not_zero(&vma->vm_refcnt))
>     refcount_add(0x80000000, &vma->vm_refcnt);
> else
>     refcount_set(&vma->vm_refcnt, 0x80000001);
>
> this refcount_set() works because vma with vm_refcnt==0 could not be
> found by readers. I'm not sure this will still work when we add
> TYPESAFE_BY_RCU and introduce vma reuse possibility.
>
> >
> > > 2. Adding 0x80000000 saturates refcnt, so I have to use a lower bit
> > > 0x40000000 to denote writers.
> >
> > I'm confused, what? We're talking about atomic_t, right?
>
> I thought you suggested using refcount_t. According to
> https://elixir.bootlin.com/linux/v6.13-rc2/source/include/linux/refcount.h#L22
> valid values would be [0..0x7fff_ffff] and 0x80000000 is outside of
> that range. What am I missing?
>
> >
> > > 3. Currently vma_mark_attached() can be called on an already attached
> > > VMA. With vma->detached being a separate attribute that works fine but
> > > when we combine it with the vm_lock things break (extra attach would
> > > leak into lock count). I'll see if I can catch all the cases when we
> > > do this and clean them up (not call vma_mark_attached() when not
> > > necessary).
> >
> > Right, I hadn't looked at that thing in detail, that sounds like it
> > needs a wee cleanup like you suggest.
>
> Yes, I'll embark on that today. Will see how much of a problem that is.





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux