Re: [RFC PATCH v5 06/15] KVM: guest_memfd: Handle final folio_put() of guestmem pages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Vishal Annapurve <vannapurve@xxxxxxxxxx> writes:

> On Thu, Jan 23, 2025 at 1:51 AM Fuad Tabba <tabba@xxxxxxxxxx> wrote:
>>
>> On Wed, 22 Jan 2025 at 22:16, Ackerley Tng <ackerleytng@xxxxxxxxxx> wrote:
>> >
>> > Fuad Tabba <tabba@xxxxxxxxxx> writes:
>> >
>> > Hey Fuad, I'm still working on verifying all this but for now this is
>> > one issue. I think this can be fixed by checking if the folio->mapping
>> > is NULL. If it's NULL, then the folio has been disassociated from the
>> > inode, and during the dissociation (removal from filemap), the
>> > mappability can also either
>> >
>> > 1. Be unset so that the default mappability can be set up based on
>> >    GUEST_MEMFD_FLAG_INIT_MAPPABLE, or
>> > 2. Be directly restored based on GUEST_MEMFD_FLAG_INIT_MAPPABLE
>>
>> Thanks for pointing this out. I hadn't considered this case. I'll fix
>> in the respin.
>>
>
> Can the below scenario cause trouble?
> 1) Userspace converts a certain range of guest memfd as shared and
> grabs some refcounts on shared memory pages through existing kernel
> exposed mechanisms.
> 2) Userspace converts the same range to private which would cause the
> corresponding mappability attributes to be *MAPPABILITY_NONE.
> 3) Userspace truncates the range which will remove the page from pagecache.
> 4) Userspace does the fallocate again, leading to a new page getting
> allocated without freeing the older page which is still refcounted
> (step 1).
>
> Effectively this could allow userspace to keep allocating multiple
> pages for the same guest_memfd range.

I'm still verifying this but for now here's the flow Vishal described in
greater detail:

+ guest_memfd starts without GUEST_MEMFD_FLAG_INIT_MAPPABLE
    + All new pages will start with mappability = GUEST
+ guest uses a page
    + Get new page
    + Add page to filemap
+ guest converts page to shared
    + Mappability is now ALL
+ host uses page
+ host takes transient refcounts on page
    + Refcount on the page is now (a) filemap's refcount (b) vma's refcount
      (c) transient refcount
+ guest converts page to private
    + Page is unmapped
        + Refcount on the page is now (a) filemap's refcount (b) transient
          refcount
    + Since refcount is elevated, the mappabilities are left as NONE
    + Filemap's refcounts are removed from the page
        + Refcount on the page is now (a) transient refcount
+ host punches hole to deallocate page
    + Since mappability was NONE, restore filemap's refcount
        + Refcount on the page is now (a) transient refcount (b) filemap's
          refcount
    + Mappabilities are reset to GUEST for truncated range
    + Folio is removed from filemap
        + Refcount on the page is now (a) transient refcount
    + Callback remains registered so that when the transient refcounts are
      dropped, cleanup can happen - this is where merging will happen
      with 1G page support
+ host fallocate()s in the same address range
    + will get a new page

Though the host does manage to get a new page while the old one stays
around, I think this is working as intended, since the transient
refcounts are truly holding the old folio around. When the transient
refcounts go away, the old folio will still get cleaned up (with 1G page
support: merged and returned) to as expected. The new page will also be
freed at some point later.

If the userspace program decides to keep taking transient refcounts to hold
pages around, then the userspace program is truly leaking memory and it
shouldn't be guest_memfd's bug.





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [Linux for Sparc]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux