Re: [RFC PATCH v5 06/15] KVM: guest_memfd: Handle final folio_put() of guestmem pages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 4, 2025 at 8:31 PM Ackerley Tng <ackerleytng@xxxxxxxxxx> wrote:
>
> Vishal Annapurve <vannapurve@xxxxxxxxxx> writes:
>
> > On Thu, Jan 23, 2025 at 1:51 AM Fuad Tabba <tabba@xxxxxxxxxx> wrote:
> >>
> >> On Wed, 22 Jan 2025 at 22:16, Ackerley Tng <ackerleytng@xxxxxxxxxx> wrote:
> >> >
> >> > Fuad Tabba <tabba@xxxxxxxxxx> writes:
> >> >
> >> > Hey Fuad, I'm still working on verifying all this but for now this is
> >> > one issue. I think this can be fixed by checking if the folio->mapping
> >> > is NULL. If it's NULL, then the folio has been disassociated from the
> >> > inode, and during the dissociation (removal from filemap), the
> >> > mappability can also either
> >> >
> >> > 1. Be unset so that the default mappability can be set up based on
> >> >    GUEST_MEMFD_FLAG_INIT_MAPPABLE, or
> >> > 2. Be directly restored based on GUEST_MEMFD_FLAG_INIT_MAPPABLE
> >>
> >> Thanks for pointing this out. I hadn't considered this case. I'll fix
> >> in the respin.
> >>
> >
> > Can the below scenario cause trouble?
> > 1) Userspace converts a certain range of guest memfd as shared and
> > grabs some refcounts on shared memory pages through existing kernel
> > exposed mechanisms.
> > 2) Userspace converts the same range to private which would cause the
> > corresponding mappability attributes to be *MAPPABILITY_NONE.
> > 3) Userspace truncates the range which will remove the page from pagecache.
> > 4) Userspace does the fallocate again, leading to a new page getting
> > allocated without freeing the older page which is still refcounted
> > (step 1).
> >
> > Effectively this could allow userspace to keep allocating multiple
> > pages for the same guest_memfd range.
>
> I'm still verifying this but for now here's the flow Vishal described in
> greater detail:
>
> + guest_memfd starts without GUEST_MEMFD_FLAG_INIT_MAPPABLE
>     + All new pages will start with mappability = GUEST
> + guest uses a page
>     + Get new page
>     + Add page to filemap
> + guest converts page to shared
>     + Mappability is now ALL
> + host uses page
> + host takes transient refcounts on page
>     + Refcount on the page is now (a) filemap's refcount (b) vma's refcount
>       (c) transient refcount
> + guest converts page to private
>     + Page is unmapped
>         + Refcount on the page is now (a) filemap's refcount (b) transient
>           refcount
>     + Since refcount is elevated, the mappabilities are left as NONE
>     + Filemap's refcounts are removed from the page
>         + Refcount on the page is now (a) transient refcount
> + host punches hole to deallocate page
>     + Since mappability was NONE, restore filemap's refcount
>         + Refcount on the page is now (a) transient refcount (b) filemap's
>           refcount
>     + Mappabilities are reset to GUEST for truncated range
>     + Folio is removed from filemap
>         + Refcount on the page is now (a) transient refcount
>     + Callback remains registered so that when the transient refcounts are
>       dropped, cleanup can happen - this is where merging will happen
>       with 1G page support
> + host fallocate()s in the same address range
>     + will get a new page
>
> Though the host does manage to get a new page while the old one stays
> around, I think this is working as intended, since the transient
> refcounts are truly holding the old folio around. When the transient
> refcounts go away, the old folio will still get cleaned up (with 1G page
> support: merged and returned) to as expected. The new page will also be
> freed at some point later.
>
> If the userspace program decides to keep taking transient refcounts to hold
> pages around, then the userspace program is truly leaking memory and it
> shouldn't be guest_memfd's bug.

I wouldn't call such references transient. But a similar scenario is
applicable for shmem files so it makes sense to call out this behavior
as WAI.





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [Linux for Sparc]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux