Re: [RFC PATCH 0/5] madvise MADV_DOEXEC

Steven Sistare <steven.sistare@xxxxxxxxxx> · Thu, 30 Jul 2020 11:59:42 -0400

On 7/30/2020 11:22 AM, Matthew Wilcox wrote:
> On Mon, Jul 27, 2020 at 10:11:22AM -0700, Anthony Yznaga wrote:
>> This patchset adds support for preserving an anonymous memory range across
>> exec(3) using a new madvise MADV_DOEXEC argument.  The primary benefit for
>> sharing memory in this manner, as opposed to re-attaching to a named shared
>> memory segment, is to ensure it is mapped at the same virtual address in
>> the new process as it was in the old one.  An intended use for this is to
>> preserve guest memory for guests using vfio while qemu exec's an updated
>> version of itself.  By ensuring the memory is preserved at a fixed address,
>> vfio mappings and their associated kernel data structures can remain valid.
>> In addition, for the qemu use case, qemu instances that back guest RAM with
>> anonymous memory can be updated.
> 
> I just realised that something else I'm working on might be a suitable
> alternative to this.  Apologies for not realising it sooner.
> 
> http://www.wil.cx/~willy/linux/sileby.html
> 
> To use this, you'd mshare() the anonymous memory range, essentially
> detaching the VMA from the current process's mm_struct and reparenting
> it to this new mm_struct, which has an fd referencing it.
> 
> Then you call exec(), and the exec'ed task gets to call mmap() on that
> new fd to attach the memory range to its own address space.
> 
> Presto!

To be suitable for the qemu use case, we need a guarantee that the same VA range
is available in the new process, with nothing else mapped there.  From your spec,
it sounds like the new process could do a series of unrelated mmap's which could
overlap the desired va range before the silby mmap(fd) is performed??

Also, we need to support updating legacy processes that already created anon segments.
We inject code that calls MADV_DOEXEC for such segments.

- Steve