From: Catalin Marinas <catalin.marinas@xxxxxxx> Sent: Monday, May 20, 2024 9:53 AM > > On Wed, May 15, 2024 at 11:47:02AM +0100, Suzuki K Poulose wrote: > > On 14/05/2024 19:00, Catalin Marinas wrote: > > > On Fri, Apr 12, 2024 at 09:42:08AM +0100, Steven Price wrote: > > > > static int change_page_range(pte_t *ptep, unsigned long addr, void *data) > > > > @@ -41,6 +45,7 @@ static int change_page_range(pte_t *ptep, unsigned long addr, void *data) > > > > pte = clear_pte_bit(pte, cdata->clear_mask); > > > > pte = set_pte_bit(pte, cdata->set_mask); > > > > + /* TODO: Break before make for PROT_NS_SHARED updates */ > > > > __set_pte(ptep, pte); > > > > return 0; > > > > > > Oh, this TODO is problematic, not sure we can do it safely. There are > > > some patches on the list to trap faults from other CPUs if they happen > > > to access the page when broken but so far we pushed back as complex and > > > at risk of getting the logic wrong. > > > > > > From an architecture perspective, you are changing the output address > > > and D8.16.1 requires a break-before-make sequence (FEAT_BBM doesn't > > > help). So we either come up with a way to do BMM safely (stop_machine() > > > maybe if it's not too expensive or some way to guarantee no accesses to > > > this page while being changed) or we get the architecture clarified on > > > the possible side-effects here ("unpredictable" doesn't help). > > > > Thanks, we need to sort this out. > > Thanks for the clarification on RIPAS states and behaviour in one of > your replies. Thinking about this, since the page is marked as > RIPAS_EMPTY prior to changing the PTE, the address is going to fault > anyway as SEA if accessed. So actually breaking the PTE, TLBI, setting > the new PTE would not add any new behaviour. Of course, this assumes > that set_memory_decrypted() is never called on memory being currently > accessed (can we guarantee this?). While I worked on CoCo VM support on Hyper-V for x86 -- both AMD SEV-SNP and Intel TDX, I haven't ramped up on the ARM64 CoCo VM architecture yet. With that caveat in mind, the assumption is that callers of set_memory_decrypted() and set_memory_encrypted() ensure that the target memory isn't currently being accessed. But there's a big exception: load_unaligned_zeropad() can generate accesses that the caller can't control. If load_unaligned_zeropad() touches a page that is in transition between decrypted and encrypted, a SEV-SNP or TDX architectural fault could occur. On x86, those fault handlers detect this case, and fix things up. The Hyper-V case requires a different approach, and marks the PTEs as "not present" before initiating a transition between decrypted and encrypted, and marks the PTEs "present" again after the transition. This approach causes a reference generated by load_unaligned_zeropad() to take the normal page fault route, and use the page-fault-based fixup for load_unaligned_zeropad(). See commit 0f34d11234868 for the Hyper-V case. > > So, we need to make sure that there is no access to the linear map > between set_memory_range_shared() and the actual pte update with > __change_memory_common() in set_memory_decrypted(). At a quick look, > this seems to be the case (ignoring memory scrubbers, though dummy ones > just accessing memory are not safe anyway and unlikely to run in a > guest). > > (I did come across the hv_uio_probe() which, if I read correctly, it > ends up calling set_memory_decrypted() with a vmalloc() address; let's > pretend this code doesn't exist ;)) While the Hyper-V UIO driver is perhaps a bit of an outlier, the Hyper-V netvsc driver also does set_memory_decrypted() on 16 Mbyte vmalloc() allocations, and there's not really a viable way to avoid this. The SEV-SNP and TDX code handles this case. Support for this case will probably also be needed for CoCo guests on Hyper-V on ARM64. Michael