RE: [RFC PATCH 1/1] x86/mm: Mark CoCo VM pages invalid while moving between private and shared

"Michael Kelley (LINUX)" <mikelley@xxxxxxxxxxxxx> · Fri, 1 Sep 2023 14:44:57 +0000

From: Edgecombe, Rick P <rick.p.edgecombe@xxxxxxxxx> Sent: Thursday, August 31, 2023 10:30 AM
> 
> On Wed, 2023-08-30 at 16:40 -0700, Rick Edgecombe wrote:
> > This is a bit of an existing problem, but the failure cases of these
> > set_memory_en/decrypted() operations does not look to be in great
> > shape. It could fail halfway through if it needs to split the direct
> > map under memory pressure, in which case some of the callers will see
> > the error and free the unmapped pages to the direct map. (I was
> > looking at dma_direct_alloc()) Other's just leak the pages.

See further comments below.

> >
> > But the situation before the patch is not much better, since the direct
> > map change or enc_status_change_prepare/finish() could fail and leave
> > the pages in an inconsistent state, like this patch is trying to address.
> >
> > This lack of rollback on failure for CPA calls needs particular odd
> > handling in all the set_memory() callers. The way is to make a CPA call
> > to restore it to the previous permission, regardless of the error code
> > returned in the initial call that failed. The callers depend on any PTE
> > change successfully made having any needed splits already done for
> > those PTEs, so the restore can succeed at least as far as the failed
> > CPA call got.
> 
> Wait, since this does set_memory_np() as the first step for both
> set_memory_encrypted() and set_memory_decrypted(), that pattern in the
> callers wouldn't work. I wonder if it should try to rollback itself if
> set_memory_np() fails (call set_memory_p() before returning the error).
> At least that will handle failures that happen on the guest side.

Yes, I agree the error handling is very limited.  I'll try to make my
patch cleanup properly if set_memory_np() fails as step 1.  In general,
complete error cleanup on private <-> shared transitions looks to be
pretty hard, and the original implementation obviously didn't deal
with it.  For most of the steps in the sequence, a failure indicates
something is pretty seriously broken with the CoCo aspects of the
VM, and it's not clear that trying to clean up is likely to succeed or
will make things any better.  

> 
> >
> > In this COCO case apparently the enc_status_change_prepare/finish()
> > could fail too (and maybe not have the same forward progress
> > behavior?). So I'm not sure what you can do in that case.

Exactly.

> >
> > I'm also not sure how bad it is to free encryption mismatched pages.  Is
> > it the same as freeing unmapped pages? (likely oops or panic)

It's bad to free mismatched pages.  You're just setting a time bomb
for some other code to encounter later.  It will either cause an
oops/panic, or will allow the VM to unknowingly put data in a page
that is visible to the hypervisor and violate the tenets of being a
CoCo VM.  In the code I've worked with, we don't free any memory
where set_memory_encrypted() or set_memory_decrypted() has failed.
We assume there hasn't been a cleanup and hence the state and
consistency of the memory is unknown.

I think there's a decent argument for not investing too much effort
in the cleanup paths for private <-> shared transitions.  A failure
indicates that something is seriously wrong, from which full recovery
is unlikely.  Callers of set_memory_encrypted()/decrypted() should
assume that a failure leaves the memory in an inconsistent state,
and the memory should just be leaked until the VM can be shutdown.
Some strong comments along these lines could be added to
set_memory_encrypted()/decrypted().

I'm going to be out on vacation until mid-September, so it will be
a couple of weeks before I get back to doing a full patch set that
responds to these and other comments.

Michael