Re: [PATCH v6 3/6] mm: introduce memfd_secret system call to create "secret" memory areas

Mike Rapoport <rppt@xxxxxxxxxx> · Wed, 30 Sep 2020 13:35:07 +0300

On Tue, Sep 29, 2020 at 08:06:03PM +0000, Edgecombe, Rick P wrote:
> On Tue, 2020-09-29 at 16:06 +0300, Mike Rapoport wrote:
> > On Tue, Sep 29, 2020 at 04:58:44AM +0000, Edgecombe, Rick P wrote:
> > > On Thu, 2020-09-24 at 16:29 +0300, Mike Rapoport wrote:
> > > > Introduce "memfd_secret" system call with the ability to create
> > > > memory
> > > > areas visible only in the context of the owning process and not
> > > > mapped not
> > > > only to other processes but in the kernel page tables as well.
> > > > 
> > > > The user will create a file descriptor using the memfd_secret()
> > > > system call
> > > > where flags supplied as a parameter to this system call will
> > > > define
> > > > the
> > > > desired protection mode for the memory associated with that file
> > > > descriptor.
> > > > 
> > > >   Currently there are two protection modes:
> > > > 
> > > > * exclusive - the memory area is unmapped from the kernel direct
> > > > map
> > > > and it
> > > >                is present only in the page tables of the owning
> > > > mm.
> > > 
> > > Seems like there were some concerns raised around direct map
> > > efficiency, but in case you are going to rework this...how does
> > > this
> > > memory work for the existing kernel functionality that does things
> > > like
> > > this?
> > > 
> > > get_user_pages(, &page);
> > > ptr = kmap(page);
> > > foo = *ptr;
> > > 
> > > Not sure if I'm missing something, but I think apps could cause the
> > > kernel to access a not-present page and oops.
> > 
> > The idea is that this memory should not be accessible by the kernel,
> > so
> > the sequence you describe should indeed fail.
> > 
> > Probably oops would be to noisy and in this case the report needs to
> > be
> > less verbose.
> 
> I was more concerned that it could cause kernel instabilities.

I think kernel recovers nicely from such sort of page fault, at least on
x86.

> I see, so it should not be accessed even at the userspace address? I
> wonder if it should be prevented somehow then. At least
> get_user_pages() should be prevented I think. Blocking copy_*_user()
> access might not be simple.
> 
> I'm also not so sure that a user would never have any possible reason
> to copy data from this memory into the kernel, even if it's just
> convenience. In which case a user setup could break if a specific
> kernel implementation switched to get_user_pages()/kmap() from using
> copy_*_user(). So seems maybe a bit thorny without fully blocking
> access from the kernel, or deprecating that pattern.
> 
> You should probably call out these "no passing data to/from the kernel"
> expectations, unless I missed them somewhere.

You are right, I should have been more explicit in the description of
the expected behavoir. 

Our thinking was that copy_*user() would work in the context of the
process that "owns" the secretmem and gup() would not allow access in
general, unless requested with certail (yet another) FOLL_ flag.

-- 
Sincerely yours,
Mike.