Re: [RFC PATCH v3 0/6] Direct Map Removal for guest_memfd

David Hildenbrand <david@xxxxxxxxxx> · Fri, 15 Nov 2024 18:10:33 +0100

On 15.11.24 17:59, Patrick Roy wrote:

On Tue, 2024-11-12 at 14:52 +0000, David Hildenbrand wrote:
On 12.11.24 15:40, Patrick Roy wrote:
I remember talking to someone at some point about whether we could reuse
the proc-local stuff for guest memory, but I cannot remember the outcome
of that discussion... (or maybe I just wanted to have a discussion about
it, but forgot to follow up on that thought?).  I guess we wouldn't use
proc-local _allocations_, but rather just set up proc-local mappings of
the gmem allocations that have been removed from the direct map.

Yes. And likely only for memory we really access / try access, if possible.

Well, if we start on-demand mm-local mapping the things we want to
access, we're back in TLB flush hell, no?

At least the on-demand mapping shouldn't require a TLB flush? Only 
"unmapping" if we want to restrict the size of a "mapped pool" of 
restricted size.

Anyhow, this would be a pure optimization, to avoid the expense of 
mapping everything, when in practice you'd like not access most of it.

(my theory, happy to be told I'm wrong :) )

And we can't know
ahead-of-time what needs to be mapped, so everything would need to be
mapped (unless we do something like mm-local mapping a page on first
access, and then just never unmapping it again, under the assumption
that establishing the mapping won't be expensive)

Right, the whole problem is that we don't know that upfront.

I'm wondering, where exactly would be the differences to Sean's idea
about messing with the CR3 register inside KVM to temporarily install
page tables that contain all the gmem stuff, conceptually? Wouldn't we
run into the same interrupt problems that Sean foresaw for the CR3
stuff? (which, admittedly, I still don't quite follow what these are :(
).

I'd need some more details on that. If anything would rely on the direct
mapping (from IRQ context?) than ... we obviously cannot remove the
direct mapping :)

I've talked to Fares internally, and it seems that generally doing
mm-local mappings of guest memory would work for us. We also figured out
what the "interrupt problem" is, namely that if we receive an interrupt
while executing in a context that has mm-local mappings available, those
mappings will continue to be available while the interrupt is being
handled.

Isn't that likely also the case with secretmem where we removed the 
directmap, but have an effective per-mm mapping in the (user-space 
portion) of the page table?

I'm talking to my security folks to see how much of a concern
this is for the speculation hardening we're trying to achieve. Will keep
you in the loop there :)

Thanks!

--
Cheers,

David / dhildenb