Re: DANGER WILL ROBINSON, DANGER

Jerome Glisse <jglisse@xxxxxxxxxx> · Thu, 15 Aug 2019 16:16:30 -0400

On Thu, Aug 15, 2019 at 03:19:29PM -0400, Jerome Glisse wrote:
> On Tue, Aug 13, 2019 at 02:01:35PM +0300, Adalbert Lazăr wrote:
> > On Fri, 9 Aug 2019 09:24:44 -0700, Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> > > On Fri, Aug 09, 2019 at 07:00:26PM +0300, Adalbert Lazăr wrote:
> > > > +++ b/include/linux/page-flags.h
> > > > @@ -417,8 +417,10 @@ PAGEFLAG(Idle, idle, PF_ANY)
> > > >   */
> > > >  #define PAGE_MAPPING_ANON	0x1
> > > >  #define PAGE_MAPPING_MOVABLE	0x2
> > > > +#define PAGE_MAPPING_REMOTE	0x4
> > > 
> > > Uh.  How do you know page->mapping would otherwise have bit 2 clear?
> > > Who's guaranteeing that?
> > > 
> > > This is an awfully big patch to the memory management code, buried in
> > > the middle of a gigantic series which almost guarantees nobody would
> > > look at it.  I call shenanigans.
> > > 
> > > > @@ -1021,7 +1022,7 @@ void page_move_anon_rmap(struct page *page, struct vm_area_struct *vma)
> > > >   * __page_set_anon_rmap - set up new anonymous rmap
> > > >   * @page:	Page or Hugepage to add to rmap
> > > >   * @vma:	VM area to add page to.
> > > > - * @address:	User virtual address of the mapping	
> > > > + * @address:	User virtual address of the mapping
> > > 
> > > And mixing in fluff changes like this is a real no-no.  Try again.
> > > 
> > 
> > No bad intentions, just overzealous.
> > I didn't want to hide anything from our patches.
> > Once we advance with the introspection patches related to KVM we'll be
> > back with the remote mapping patch, split and cleaned.
> 
> They are not bit left in struct page ! Looking at the patch it seems
> you want to have your own pin count just for KVM. This is bad, we are
> already trying to solve the GUP thing (see all various patchset about
> GUP posted recently).
> 
> You need to rethink how you want to achieve this. Why not simply a
> remote read()/write() into the process memory ie KVMI would call
> an ioctl that allow to read or write into a remote process memory
> like ptrace() but on steroid ...
> 
> Adding this whole big complex infrastructure without justification
> of why we need to avoid round trip is just too much really.

Thinking a bit more about this, you can achieve the same thing without
adding a single line to any mm code. Instead of having mmap with
PROT_NONE | MAP_LOCKED you have userspace mmap some kvm device file
(i am assuming this is something you already have and can control
the mmap callback).

So now kernel side you have a vma with a vm_operations_struct under
your control this means that everything you want to block mm wise
from within the inspector process can be block through those call-
backs (find_special_page() specificaly for which you have to return
NULL all the time).

To mirror target process memory you can use hmm_mirror, when you
populate the inspector process page table you use insert_pfn()
(mmap of the kvm device file must mark this vma as PFNMAP).

By following the hmm_mirror API, anytime the target process has
a change in its page table (ie virtual address -> page) you will
get a callback and all you have to do is clear the page table
within the inspector process and flush tlb (use zap_page_range).

On page fault within the inspector process the fault callback of
vm_ops will get call and from there you call hmm_mirror following
its API.

Oh also mark the vma with VM_WIPEONFORK to avoid any issue if the
inspector process use fork() (you could support fork but then you
would need to mark the vma as SHARED and use unmap_mapping_pages
instead of zap_page_range).

There everything you want to do with already upstream mm code.

Cheers,
Jérôme
_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization