Re: [RFC] KVM: mm: fd-based approach for supporting KVM guest private memory

David Hildenbrand <david@xxxxxxxxxx> · Wed, 15 Sep 2021 16:59:46 +0200

I don't think we are, it still feels like we are in the early prototype
phase (even way before a PoC). I'd be happy to see something "cleaner" so to
say -- it still feels kind of hacky to me, especially there seem to be many
pieces of the big puzzle missing so far. Unfortunately, this series hasn't
caught the attention of many -MM people so far, maybe because other people
miss the big picture as well and are waiting for a complete design proposal.

For example, what's unclear to me: we'll be allocating pages with
GFP_HIGHUSER_MOVABLE, making them land on MIGRATE_CMA or ZONE_MOVABLE; then
we silently turn them unmovable, which breaks these concepts. Who'd migrate
these pages away just like when doing long-term pinning, or how is that
supposed to work?

That's fair point. We can fix it by changing mapping->gfp_mask.

That's essentially what secretmem does when setting up a file.

Also unclear to me is how refcount and mapcount will be handled to prevent
swapping,

refcount and mapcount are unchanged. Pages not pinned per se. Swapping
prevented with the change in shmem_writepage().

So when mapping into the guest, we'd increment the refcount but not the 
mapcount I assume?

who will actually do some kind of gfn-epfn etc. mapping, how we'll
forbid access to this memory e.g., via /proc/kcore or when dumping memory

It's not aimed to prevent root to shoot into his leg. Root do root.

IMHO being root is not an excuse to read some random file (actually used 
in production environments) to result in the machine crashing. Not 
acceptable for distributions.

I'm still missing the whole gfn-epfn 1:1 mapping discussion we 
identified as requirements. Is that supposed to be done by KVM? How?

... and how it would ever work with migration/swapping/rmap (it's clearly
future work, but it's been raised that this would be the way to make it
work, I don't quite see how it would all come together).

Given that hardware supports it migration and swapping can be implemented
by providing new callbacks in guest_ops. Like ->migrate_page would
transfer encrypted data between pages and ->swapout would provide
encrypted blob that can be put on disk or handled back to ->swapin to
bring back to memory.

Again, I'm missing the complete picture. To make swapping decisions 
vmscan code needs track+handle dirty+reference information. How would we 
be able to track references? Does the hardware allow for temporary 
unmapping of encrypted memory and faulting on it? How would 
page_referenced() continue working? "we can add callbacks" is not a 
satisfying answer, at least for me. Especially, when it comes to 
eventual locking problems and races.

Maybe saying "migration+swap is not supported" is clearer than "we can 
add callbacks" and missing some details on the bigger picture.

Again, a complete design proposal would be highly valuable, especially 
to get some more review from other -MM folks. Otherwise there is a high 
chance that this will be rejected late when trying to upstream and -MM 
people stumbling over it (we've had some similar thing happening just 
recently unfortunately ...).

--
Thanks,

David / dhildenb