RE: [PATCH v2 09/14] arm64: Enable memory encrypt for Realms

Michael Kelley <mhklinux@xxxxxxxxxxx> · Mon, 20 May 2024 20:32:43 +0000

From: Catalin Marinas <catalin.marinas@xxxxxxx> Sent: Monday, May 20, 2024 9:53 AM
> 
> On Wed, May 15, 2024 at 11:47:02AM +0100, Suzuki K Poulose wrote:
> > On 14/05/2024 19:00, Catalin Marinas wrote:
> > > On Fri, Apr 12, 2024 at 09:42:08AM +0100, Steven Price wrote:
> > > >   static int change_page_range(pte_t *ptep, unsigned long addr, void *data)
> > > > @@ -41,6 +45,7 @@ static int change_page_range(pte_t *ptep, unsigned long addr, void *data)
> > > >   	pte = clear_pte_bit(pte, cdata->clear_mask);
> > > >   	pte = set_pte_bit(pte, cdata->set_mask);
> > > > +	/* TODO: Break before make for PROT_NS_SHARED updates */
> > > >   	__set_pte(ptep, pte);
> > > >   	return 0;
> > >
> > > Oh, this TODO is problematic, not sure we can do it safely. There are
> > > some patches on the list to trap faults from other CPUs if they happen
> > > to access the page when broken but so far we pushed back as complex and
> > > at risk of getting the logic wrong.
> > >
> > >  From an architecture perspective, you are changing the output address
> > > and D8.16.1 requires a break-before-make sequence (FEAT_BBM doesn't
> > > help). So we either come up with a way to do BMM safely (stop_machine()
> > > maybe if it's not too expensive or some way to guarantee no accesses to
> > > this page while being changed) or we get the architecture clarified on
> > > the possible side-effects here ("unpredictable" doesn't help).
> >
> > Thanks, we need to sort this out.
> 
> Thanks for the clarification on RIPAS states and behaviour in one of
> your replies. Thinking about this, since the page is marked as
> RIPAS_EMPTY prior to changing the PTE, the address is going to fault
> anyway as SEA if accessed. So actually breaking the PTE, TLBI, setting
> the new PTE would not add any new behaviour. Of course, this assumes
> that set_memory_decrypted() is never called on memory being currently
> accessed (can we guarantee this?).

While I worked on CoCo VM support on Hyper-V for x86 -- both AMD
SEV-SNP and Intel TDX, I haven't ramped up on the ARM64 CoCo
VM architecture yet.  With that caveat in mind, the assumption is that callers
of set_memory_decrypted() and set_memory_encrypted() ensure that
the target memory isn't currently being accessed.   But there's a big
exception:  load_unaligned_zeropad() can generate accesses that the
caller can't control.  If load_unaligned_zeropad() touches a page that is
in transition between decrypted and encrypted, a SEV-SNP or TDX architectural
fault could occur.  On x86, those fault handlers detect this case, and
fix things up.  The Hyper-V case requires a different approach, and marks
the PTEs as "not present" before initiating a transition between decrypted
and encrypted, and marks the PTEs "present" again after the transition.
This approach causes a reference generated by load_unaligned_zeropad() 
to take the normal page fault route, and use the page-fault-based fixup for
load_unaligned_zeropad(). See commit 0f34d11234868 for the Hyper-V case.

> 
> So, we need to make sure that there is no access to the linear map
> between set_memory_range_shared() and the actual pte update with
> __change_memory_common() in set_memory_decrypted(). At a quick look,
> this seems to be the case (ignoring memory scrubbers, though dummy ones
> just accessing memory are not safe anyway and unlikely to run in a
> guest).
> 
> (I did come across the hv_uio_probe() which, if I read correctly, it
> ends up calling set_memory_decrypted() with a vmalloc() address; let's
> pretend this code doesn't exist ;))

While the Hyper-V UIO driver is perhaps a bit of an outlier, the Hyper-V
netvsc driver also does set_memory_decrypted() on 16 Mbyte vmalloc()
allocations, and there's not really a viable way to avoid this. The
SEV-SNP and TDX code handles this case.   Support for this case will
probably also be needed for CoCo guests on Hyper-V on ARM64.

Michael