On Thu, Jul 30, 2020 at 05:27:05PM +0200, Christian Brauner wrote: > On Thu, Jul 30, 2020 at 04:22:50PM +0100, Matthew Wilcox wrote: > > On Mon, Jul 27, 2020 at 10:11:22AM -0700, Anthony Yznaga wrote: > > > This patchset adds support for preserving an anonymous memory range across > > > exec(3) using a new madvise MADV_DOEXEC argument. The primary benefit for > > > sharing memory in this manner, as opposed to re-attaching to a named shared > > > memory segment, is to ensure it is mapped at the same virtual address in > > > the new process as it was in the old one. An intended use for this is to > > > preserve guest memory for guests using vfio while qemu exec's an updated > > > version of itself. By ensuring the memory is preserved at a fixed address, > > > vfio mappings and their associated kernel data structures can remain valid. > > > In addition, for the qemu use case, qemu instances that back guest RAM with > > > anonymous memory can be updated. > > > > I just realised that something else I'm working on might be a suitable > > alternative to this. Apologies for not realising it sooner. > > > > http://www.wil.cx/~willy/linux/sileby.html > > Just skimming: make it O_CLOEXEC by default. ;) I appreciate the suggestion, and it makes sense for many 'return an fd' interfaces, but the point of mshare() is to, well, share. So sharing the fd with a child is a common usecase, unlike say sharing a timerfd. The only other reason to use mshare() is to pass the fd over a unix socket to a non-child, and I submit that is far less common than wanting to share with a child.