Re: [RFC PATCH 0/5] Remote mapping

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



CC+= Mihai, Mircea

On Thu,  3 Sep 2020 20:47:25 +0300, Adalbert Lazăr <alazar@xxxxxxxxxxxxxxx> wrote:
> This patchset adds support for the remote mapping feature.
> Remote mapping, as its name suggests, is a means for transparent and
> zero-copy access of a remote process' address space.
> access of a remote process' address space.
> 
> The feature was designed according to a specification suggested by Paolo Bonzini:
> >> The proposed API is a new pidfd system call, through which the parent
> >> can map portions of its virtual address space into a file descriptor
> >> and then pass that file descriptor to a child.
> >>
> >> This should be:
> >>
> >> - upstreamable, pidfd is the new cool thing and we could sell it as a
> >> better way to do PTRACE_{PEEK,POKE}DATA
> >>
> >> - relatively easy to do based on the bitdefender remote process
> >> mapping patches at.
> >>
> >> - pidfd_mem() takes a pidfd and some flags (which are 0) and returns
> >> two file descriptors for respectively the control plane and the memory access.
> >>
> >> - the control plane accepts three ioctls
> >>
> >> PIDFD_MEM_MAP takes a struct like
> >>
> >>     struct pidfd_mem_map {
> >>          uint64_t address;
> >>          off_t offset;
> >>          off_t size;
> >>          int flags;
> >>          int padding[7];
> >>     }
> >>
> >> After this is done, the memory access fd can be mmap-ed at range
> >> [offset,
> >> offset+size), and it will read memory from range [address,
> >> address+size) of the target descriptor.
> >>
> >> PIDFD_MEM_UNMAP takes a struct like
> >>
> >>     struct pidfd_mem_unmap {
> >>          off_t offset;
> >>          off_t size;
> >>     }
> >>
> >> and unmaps the corresponding range of course.
> >>
> >> Finally PIDFD_MEM_LOCK forbids subsequent PIDFD_MEM_MAP or
> >> PIDFD_MEM_UNMAP.  For now I think it should just check that the
> >> argument is zero, bells and whistles can be added later.
> >>
> >> - the memory access fd can be mmap-ed as in the bitdefender patches
> >> but also accessed with read/write/pread/pwrite/...  As in the
> >> BitDefender patches, MMU notifiers can be used to adjust any mmap-ed
> >> regions when the source address space changes.  In this case,
> >> PIDFD_MEM_UNMAP could also cause a pre-existing mmap to "disappear".
> (it currently doesn't support read/write/pread/pwrite/...)
> 
> The main remote mapping patch also contains the legacy implementation which
> creates a region the size of the whole process address space by means of the
> REMOTE_PROC_MAP ioctl. The user is then free to mmap() any region of the
> address space it wishes.
> 
> VMAs obtained by mmap()ing memory access fds mirror the contents of the remote
> process address space within the specified range. Pages are installed in the
> current process page tables at fault time and removed by the mmu_interval_notifier
> invalidate callbck. No further memory management is involved.
> On attempts to access a hole, or if a mapping was removed by PIDFD_MEM_UNMAP,
> or if the remote process address space was reaped by OOM, the remote mapping
> fault handler returns VM_FAULT_SIGBUS.
> 
> At Bitdefender we are using remote mapping for virtual machine introspection:
> - the QEMU running the introspected machine creates the pair of file descriptors,
> passes the access fd to the introspector QEMU, and uses the control fd to allow
> access to the memslots it creates for its machine
> - the QEMU running the introspector machine receives the access fd and mmap()s
> the regions made available, then hotplugs the obtained memory in its machine
> Having this setup creates nested invalidate_range_start/end MMU notifier calls.
> 
> Patch organization:
> - patch 1 allows unmap_page_range() to run without rescheduling
>   Needed for remote mapping to zap current process page tables when OOM calls
>   mmu_notifier_invalidate_range_start_nonblock(&range)
> 
> - patch 2 creates VMA-specific zapping behavior
>   A remote mapping VMA does not own the pages it maps, so all it has to do is
>   clear the PTEs.
> 
> - patch 3 removed MMU notifier lockdep map
>   It was just incompatible with our use case.
> 
> - patch 4 is the remote mapping implementation
> 
> - patch 5 adds suggested pidfd_mem system call
> 
> Mircea Cirjaliu (5):
>   mm: add atomic capability to zap_details
>   mm: let the VMA decide how zap_pte_range() acts on mapped pages
>   mm/mmu_notifier: remove lockdep map, allow mmu notifier to be used in
>     nested scenarios
>   mm/remote_mapping: use a pidfd to access memory belonging to unrelated
>     process
>   pidfd_mem: implemented remote memory mapping system call
> 
>  arch/x86/entry/syscalls/syscall_32.tbl |    1 +
>  arch/x86/entry/syscalls/syscall_64.tbl |    1 +
>  include/linux/mm.h                     |   22 +
>  include/linux/mmu_notifier.h           |    5 +-
>  include/linux/pid.h                    |    1 +
>  include/linux/remote_mapping.h         |   22 +
>  include/linux/syscalls.h               |    1 +
>  include/uapi/asm-generic/unistd.h      |    2 +
>  include/uapi/linux/remote_mapping.h    |   36 +
>  kernel/exit.c                          |    2 +-
>  kernel/pid.c                           |   55 +
>  mm/Kconfig                             |   11 +
>  mm/Makefile                            |    1 +
>  mm/memory.c                            |  193 ++--
>  mm/mmu_notifier.c                      |   19 -
>  mm/remote_mapping.c                    | 1273 ++++++++++++++++++++++++
>  16 files changed, 1535 insertions(+), 110 deletions(-)
>  create mode 100644 include/linux/remote_mapping.h
>  create mode 100644 include/uapi/linux/remote_mapping.h
>  create mode 100644 mm/remote_mapping.c
> 
> 
> CC:Christian Brauner <christian@xxxxxxxxxx>
> base-commit: ae83d0b416db002fe95601e7f97f64b59514d936





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux