On Thu, May 19, 2022 at 8:05 AM Will Deacon <will@xxxxxxxxxx> wrote: > > Add some initial documentation for the Protected KVM (pKVM) feature on > arm64, describing the user ABI for creating protected VMs as well as > their limitations. > > Signed-off-by: Will Deacon <will@xxxxxxxxxx> > --- > .../admin-guide/kernel-parameters.txt | 4 +- > Documentation/virt/kvm/arm/index.rst | 1 + > Documentation/virt/kvm/arm/pkvm.rst | 96 +++++++++++++++++++ > 3 files changed, 100 insertions(+), 1 deletion(-) > create mode 100644 Documentation/virt/kvm/arm/pkvm.rst > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > index 63a764ec7fec..b8841a969f59 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -2437,7 +2437,9 @@ > protected guests. > > protected: nVHE-based mode with support for guests whose > - state is kept private from the host. > + state is kept private from the host. See > + Documentation/virt/kvm/arm/pkvm.rst for more > + information about this mode of operation. > > Defaults to VHE/nVHE based on hardware support. Setting > mode to "protected" will disable kexec and hibernation > diff --git a/Documentation/virt/kvm/arm/index.rst b/Documentation/virt/kvm/arm/index.rst > index b4067da3fcb6..49c388df662a 100644 > --- a/Documentation/virt/kvm/arm/index.rst > +++ b/Documentation/virt/kvm/arm/index.rst > @@ -9,6 +9,7 @@ ARM > > hyp-abi > hypercalls > + pkvm > psci > pvtime > ptp_kvm > diff --git a/Documentation/virt/kvm/arm/pkvm.rst b/Documentation/virt/kvm/arm/pkvm.rst > new file mode 100644 > index 000000000000..64f099a5ac2e > --- /dev/null > +++ b/Documentation/virt/kvm/arm/pkvm.rst > @@ -0,0 +1,96 @@ > +.. SPDX-License-Identifier: GPL-2.0 > + > +Protected virtual machines (pKVM) > +================================= > + > +Introduction > +------------ > + > +Protected KVM (pKVM) is a KVM/arm64 extension which uses the two-stage > +translation capability of the Armv8 MMU to isolate guest memory from the host > +system. This allows for the creation of a confidential computing environment > +without relying on whizz-bang features in hardware, but still allowing room for > +complementary technologies such as memory encryption and hardware-backed > +attestation. > + > +The major implementation change brought about by pKVM is that the hypervisor > +code running at EL2 is now largely independent of (and isolated from) the rest > +of the host kernel running at EL1 and therefore additional hypercalls are > +introduced to manage manipulation of guest stage-2 page tables, creation of VM > +data structures and reclamation of memory on teardown. An immediate consequence > +of this change is that the host itself runs with an identity mapping enabled > +at stage-2, providing the hypervisor code with a mechanism to restrict host > +access to an arbitrary physical page. > + > +Enabling pKVM > +------------- > + > +The pKVM hypervisor is enabled by booting the host kernel at EL2 with > +"``kvm-arm.mode=protected``" on the command-line. Once enabled, VMs can be spawned > +in either protected or non-protected state, although the hypervisor is still > +responsible for managing most of the VM metadata in either case. > + > +Limitations > +----------- > + > +Enabling pKVM places some significant limitations on KVM guests, regardless of > +whether they are spawned in protected state. It is therefore recommended only > +to enable pKVM if protected VMs are required, with non-protected state acting > +primarily as a debug and development aid. > + > +If you're still keen, then here is an incomplete list of caveats that apply > +to all VMs running under pKVM: > + > +- Guest memory cannot be file-backed (with the exception of shmem/memfd) and is > + pinned as it is mapped into the guest. This prevents the host from > + swapping-out, migrating, merging or generally doing anything useful with the > + guest pages. It also requires that the VMM has either ``CAP_IPC_LOCK`` or > + sufficient ``RLIMIT_MEMLOCK`` to account for this pinned memory. I think it would be useful to also add a note to Documentation/virt/kvm/api.rst saying that ioctl(KVM_RUN) can return ENOMEM if the VMM does not have CAP_IPC_LOCK or sufficient RLIMIT_MEMLOCK, since that's where people are going to look when they see that return value. Peter