On Thu, 22 Jul 2021 14:25:15 +0100, Andrew Jones <drjones@xxxxxxxxxx> wrote: > > On Thu, Jul 22, 2021 at 11:00:26AM +0100, Marc Zyngier wrote: > > On Wed, 21 Jul 2021 22:42:43 +0100, > > Andrew Jones <drjones@xxxxxxxxxx> wrote: > > > > > > On Thu, Jul 15, 2021 at 05:31:43PM +0100, Marc Zyngier wrote: > > > > KVM/arm64 currently considers that any memory access outside of a > > > > memslot is a MMIO access. This so far has served us very well, but > > > > obviously relies on the guest trusting the host, and especially > > > > userspace to do the right thing. > > > > > > > > As we keep on hacking away at pKVM, it becomes obvious that this trust > > > > model is not really fit for a confidential computing environment, and > > > > that the guest would require some guarantees that emulation only > > > > occurs on portions of the address space that have clearly been > > > > identified for this purpose. > > > > > > This trust model is hard for me to reason about. userspace is trusted to > > > control the life cycle of the VM, to prepare the memslots for the VM, > > > and [presumably] identify what MMIO ranges are valid, yet it's not > > > trusted to handle invalid MMIO accesses. I'd like to learn more about > > > this model and the userspace involved. > > > > Imagine the following scenario: > > > > On top of the normal memory described as memslots (which pKVM will > > ensure that userspace cannot access), > > Ah, I didn't know that part. Yeah, that's the crucial bit. By default, pKVM guests do not share any memory with anyone, so the memslots are made inaccessible from both the VMM and the host kernel. The guest has to explicitly change the state of the memory it wants to share back with the host for things like IO. > > > a malicious userspace describes > > to the guest another memory region in a firmware table and does not > > back it with a memslot. > > > > The hypervisor cannot validate this firmware description (imagine > > doing ACPI and DT parsing at EL2...), so the guest starts using this > > "memory" for something, and data slowly trickles all the way to EL0. > > Not what you wanted. > > Yes, I see that now, in light of the above. > > > > > To ensure that this doesn't happen, we reverse the problem: userspace > > (and ultimately the EL1 kernel) doesn't get involved on a translation > > fault outside of a memslot *unless* the guest has explicitly asked for > > that page to be handled as a MMIO. With that, we have a full > > description of the IPA space contained in the S2 page tables: > > > > - memory described via a memslot, > > - directly mapped device (GICv2, for exmaple), > > - MMIO exposed for emulation > > > > and anything else is an invalid access that results in an abort. > > > > Does this make sense to you? > > Now I understand better, but if we're worried about malicious userspaces, > then how do we protect the guest from "bad" MMIO devices that have been > described to it? The guest can request access to those using this new > mechanism. We don't try to do anything about a malicious IO device. Any IO should be considered as malicious, and you don't want to give it anything in clear-text if it is supposed to be secret. Eventually, you'd probably want directly assigned devices that can attest to the guest that they are what they pretend to be, but that's a long way away. For now, we only want to enable virtio with a reduced level of trust (bounce buffering via shared pages for DMA, and reduced MMIO exposure). Thanks, M. -- Without deviation from the norm, progress is not possible.