Re: [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call for MKTME

Dave Hansen <dave.hansen@xxxxxxxxx> · Mon, 17 Jun 2019 08:28:27 -0700

On 6/17/19 8:07 AM, Andy Lutomirski wrote:
> I still find it bizarre that this is conflated with mprotect().

This needs to be in the changelog.  But, for better or worse, it's
following the mprotect_pkey() pattern.

Other than the obvious "set the key on this memory", we're looking for
two other properties: atomicity (ensuring there is no transient state
where the memory is usable without the desired properties) and that it
is usable on existing allocations.

For atomicity, we have a model where we can allocate things with
PROT_NONE, then do mprotect_pkey() and mprotect_encrypt() (plus any
future features), then the last mprotect_*() call takes us from
PROT_NONE to the desired end permisions.  We could just require a plain
old mprotect() to do that instead of embedding mprotect()-like behavior
in these, of course, but that isn't the path we're on at the moment with
mprotect_pkey().

So, for this series it's just a matter of whether we do this:

	ptr = mmap(..., PROT_NONE);
	mprotect_pkey(protect_key, ptr, PROT_NONE);
	mprotect_encrypt(encr_key, ptr, PROT_READ|PROT_WRITE);
	// good to go

or this:

	ptr = mmap(..., PROT_NONE);
	mprotect_pkey(protect_key, ptr, PROT_NONE);
	sys_encrypt(key, ptr);
	mprotect(ptr, PROT_READ|PROT_WRITE);
	// good to go

I actually don't care all that much which one we end up with.  It's not
like the extra syscall in the second options means much.

> This is part of why I much prefer the idea of making this style of
> MKTME a driver or some other non-intrusive interface.  Then, once
> everyone gets tired of it, the driver can just get turned off with no
> side effects.

I like the concept, but not where it leads.  I'd call it the 'hugetlbfs
approach". :)  Hugetblfs certainly go us huge pages, but it's continued
to be a parallel set of code with parallel bugs and parallel
implementations of many VM features.  It's not that you can't implement
new things on hugetlbfs, it's that you *need* to.  You never get them
for free.

For instance, if we do a driver, how do we get large pages?  How do we
swap/reclaim the pages?  How do we do NUMA affinity?  How do we
eventually stack it on top of persistent memory filesystems or Device
DAX?  With a driver approach, I think we're stuck basically
reimplementing things or gluing them back together.  Nothing comes for free.

With this approach, we basically start with our normal, full feature set
(modulo weirdo interactions like with KSM).