On Tue, Sep 29, 2020 at 08:06:03PM +0000, Edgecombe, Rick P wrote: > On Tue, 2020-09-29 at 16:06 +0300, Mike Rapoport wrote: > > On Tue, Sep 29, 2020 at 04:58:44AM +0000, Edgecombe, Rick P wrote: > > > On Thu, 2020-09-24 at 16:29 +0300, Mike Rapoport wrote: > > > > Introduce "memfd_secret" system call with the ability to create > > > > memory > > > > areas visible only in the context of the owning process and not > > > > mapped not > > > > only to other processes but in the kernel page tables as well. > > > > > > > > The user will create a file descriptor using the memfd_secret() > > > > system call > > > > where flags supplied as a parameter to this system call will > > > > define > > > > the > > > > desired protection mode for the memory associated with that file > > > > descriptor. > > > > > > > > Currently there are two protection modes: > > > > > > > > * exclusive - the memory area is unmapped from the kernel direct > > > > map > > > > and it > > > > is present only in the page tables of the owning > > > > mm. > > > > > > Seems like there were some concerns raised around direct map > > > efficiency, but in case you are going to rework this...how does > > > this > > > memory work for the existing kernel functionality that does things > > > like > > > this? > > > > > > get_user_pages(, &page); > > > ptr = kmap(page); > > > foo = *ptr; > > > > > > Not sure if I'm missing something, but I think apps could cause the > > > kernel to access a not-present page and oops. > > > > The idea is that this memory should not be accessible by the kernel, > > so > > the sequence you describe should indeed fail. > > > > Probably oops would be to noisy and in this case the report needs to > > be > > less verbose. > > I was more concerned that it could cause kernel instabilities. I think kernel recovers nicely from such sort of page fault, at least on x86. > I see, so it should not be accessed even at the userspace address? I > wonder if it should be prevented somehow then. At least > get_user_pages() should be prevented I think. Blocking copy_*_user() > access might not be simple. > > I'm also not so sure that a user would never have any possible reason > to copy data from this memory into the kernel, even if it's just > convenience. In which case a user setup could break if a specific > kernel implementation switched to get_user_pages()/kmap() from using > copy_*_user(). So seems maybe a bit thorny without fully blocking > access from the kernel, or deprecating that pattern. > > You should probably call out these "no passing data to/from the kernel" > expectations, unless I missed them somewhere. You are right, I should have been more explicit in the description of the expected behavoir. Our thinking was that copy_*user() would work in the context of the process that "owns" the secretmem and gup() would not allow access in general, unless requested with certail (yet another) FOLL_ flag. -- Sincerely yours, Mike.