Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Brijesh,,

One high level discussion I'd like to have on these SNP KVM patches.

In these patches (V5) if a host userspace process writes a guest
private page a SIGBUS is issued to that process. If the kernel writes
a guest private page then the kernel panics due to the unhandled RMP
fault page fault. This is an issue because not all writes into guest
memory may come from a bug in the host. For instance a malicious or
even buggy guest could easily point the host to writing a private page
during the emulation of many virtual devices (virtio, NVMe, etc). For
example if a well behaved guests behavior is to: start up a driver,
select some pages to share with the guest, ask the host to convert
them to shared, then use those pages for virtual device DMA, if a
buggy guest forget the step to request the pages be converted to
shared its easy to see how the host could rightfully write to private
memory. I think we can better guarantee host reliability when running
SNP guests without changing SNP’s security properties.

Here is an alternative to the current approach: On RMP violation (host
or userspace) the page fault handler converts the page from private to
shared to allow the write to continue. This pulls from s390’s error
handling which does exactly this. See ‘arch_make_page_accessible()’.
Additionally it adds less complexity to the SNP kernel patches, and
requires no new ABI.

In the current (V5) KVM implementation if a userspace process
generates an RMP violation (writes to guest private memory) the
process receives a SIGBUS. At first glance, it would appear that
user-space shouldn’t write to private memory. However, guaranteeing
this in a generic fashion requires locking the RMP entries (via locks
external to the RMP). Otherwise, a user-space process emulating a
guest device IO may be vulnerable to having the guest memory
(maliciously or by guest bug) converted to private while user-space
emulation is happening. This results in a well behaved userspace
process receiving a SIGBUS.

This proposal allows buggy and malicious guests to run under SNP
without jeopardizing the reliability / safety of host processes. This
is very important to a cloud service provider (CSP) since it’s common
to have host wide daemons that write/read all guests, i.e. a single
process could manage the networking for all VMs on the host. Crashing
that singleton process kills networking for all VMs on the system.

This proposal also allows for minimal changes to the kexec flow and
kdump. The new kexec kernel can simply update private pages to shared
as it encounters them during their boot. This avoids needing to
propagate the RMP state from kernel to kernel. Of course this doesn’t
preserve any running VMs but is still useful for kdump crash dumps or
quicker rekerneling for development with kexec.

This proposal does cause guest memory corruption for some bugs but one
of SEV-SNP’s goals extended from SEV-ES’s goals is for guest’s to be
able to detect when its memory has been corrupted / replayed by the
host. So SNP already has features for allowing guests to detect this
kind of memory corruption. Additionally this is very similar to a page
of memory generating a machine check because of 2-bit memory
corruption. In other words SNP guests must be enlightened and ready
for these kinds of errors.

For an SNP guest running under this proposal the flow would look like this:
* Host gets a #PF because its trying to write to a private page.
* Host #PF handler updates the page to shared.
* Write continues normally.
* Guest accesses memory (r/w).
* Guest gets a #VC error because the page is not PVALIDATED
* Guest is now in control. Guest can terminate because its memory has
been corrupted. Guest could try and continue to log the error to its
owner.

A similar approach was introduced in the SNP patches V1 and V2 for
kernel page fault handling. The pushback around this convert to shared
approach was largely focused around the idea that the kernel has all
the information about which pages are shared vs private so it should
be able to check shared status before write to pages. After V2 the
patches were updated to not have a kernel page fault handler for RMP
violations (other than dumping state during a panic). The current
patches protect the host with new post_{map,unmap}_gfn() function that
checks if a page is shared before mapping it, then locks the page
shared until unmapped. Given the discussions on ‘[Part2,v5,39/45] KVM:
SVM: Introduce ops for the post gfn map and unmap’ building a solution
to do this is non trivial and adds new overheads to KVM. Additionally
the current solution is local to the kernel. So a new ABI just now be
created to allow the userspace VMM to access the kernel-side locks for
this to work generically for the whole host. This is more complicated
than this proposal and adding more lock holders seems like it could
reduce performance further.

There are a couple corner cases with this approach. Under SNP guests
can request their memory be changed into a VMSA. This VMSA page cannot
be changed to shared while the vCPU associated with it is running. So
KVM + the #PF handler will need something to kick vCPUs from running.
Joerg believes that a possible fix for this could be a new MMU
notifier in the kernel, then on the #PF we can go through the rmp and
execute this vCPU kick callback.

Another corner case is the RMPUPDATE instruction is not guaranteed to
succeed on first iteration. As noted above if the page is a VMSA it
cannot be updated while the vCPU is running. Another issue is if the
guest is running a RMPADJUST on a page it cannot be RMPUPDATED at that
time. There is a lock for each RMP Entry so there is a race for these
instructions. The vCPU kicking can solve this issue to be kicking all
guest vCPUs which removes the chance for the race.

Since this proposal probably results in SNP guests terminating due to
a page unexpectedly needing PVALIDATE. The approach could be
simplified to just the KVM killing the guest. I think it's nicer to
users to instead of unilaterally killing the guest allowing the
unvalidated #VC exception to allow users to collect some additional
debug information and any additional clean up work they would like to
perform.

Thanks
Peter

On Fri, Aug 20, 2021 at 9:59 AM Brijesh Singh <brijesh.singh@xxxxxxx> wrote:
>
> This part of the Secure Encrypted Paging (SEV-SNP) series focuses on the
> changes required in a host OS for SEV-SNP support. The series builds upon
> SEV-SNP Part-1.
>
> This series provides the basic building blocks to support booting the SEV-SNP
> VMs, it does not cover all the security enhancement introduced by the SEV-SNP
> such as interrupt protection.
>
> The CCP driver is enhanced to provide new APIs that use the SEV-SNP
> specific commands defined in the SEV-SNP firmware specification. The KVM
> driver uses those APIs to create and managed the SEV-SNP guests.
>
> The GHCB specification version 2 introduces new set of NAE's that is
> used by the SEV-SNP guest to communicate with the hypervisor. The series
> provides support to handle the following new NAE events:
> - Register GHCB GPA
> - Page State Change Request
> - Hypevisor feature
> - Guest message request
>
> The RMP check is enforced as soon as SEV-SNP is enabled. Not every memory
> access requires an RMP check. In particular, the read accesses from the
> hypervisor do not require RMP checks because the data confidentiality is
> already protected via memory encryption. When hardware encounters an RMP
> checks failure, it raises a page-fault exception. If RMP check failure
> is due to the page-size mismatch, then split the large page to resolve
> the fault.
>
> The series does not provide support for the interrupt security and migration
> and those feature will be added after the base support.
>
> The series is based on the commit:
>  SNP part1 commit and
>  fa7a549d321a (kvm/next, next) KVM: x86: accept userspace interrupt only if no event is injected
>
> TODO:
>   * Add support for command to ratelimit the guest message request.
>
> Changes since v4:
>  * Move the RMP entry definition to x86 specific header file.
>  * Move the dump RMP entry function to SEV specific file.
>  * Use BIT_ULL while defining the #PF bit fields.
>  * Add helper function to check the IOMMU support for SEV-SNP feature.
>  * Add helper functions for the page state transition.
>  * Map and unmap the pages from the direct map after page is added or
>    removed in RMP table.
>  * Enforce the minimum SEV-SNP firmware version.
>  * Extend the LAUNCH_UPDATE to accept the base_gfn and remove the
>    logic to calculate the gfn from the hva.
>  * Add a check in LAUNCH_UPDATE to ensure that all the pages are
>    shared before calling the PSP.
>  * Mark the memory failure when failing to remove the page from the
>    RMP table or clearing the immutable bit.
>  * Exclude the encrypted hva range from the KSM.
>  * Remove the gfn tracking during the kvm_gfn_map() and use SRCU to
>    syncronize the PSC and gfn mapping.
>  * Allow PSC on the registered hva range only.
>  * Add support for the Preferred GPA VMGEXIT.
>  * Simplify the PSC handling routines.
>  * Use the static_call() for the newly added kvm_x86_ops.
>  * Remove the long-lived GHCB map.
>  * Move the snp enable module parameter to the end of the file.
>  * Remove the kvm_x86_op for the RMP fault handling. Call the
>    fault handler directly from the #NPF interception.
>
> Changes since v3:
>  * Add support for extended guest message request.
>  * Add ioctl to query the SNP Platform status.
>  * Add ioctl to get and set the SNP config.
>  * Add check to verify that memory reserved for the RMP covers the full system RAM.
>  * Start the SNP specific commands from 256 instead of 255.
>  * Multiple cleanup and fixes based on the review feedback.
>
> Changes since v2:
>  * Add AP creation support.
>  * Drop the patch to handle the RMP fault for the kernel address.
>  * Add functions to track the write access from the hypervisor.
>  * Do not enable the SNP feature when IOMMU is disabled or is in passthrough mode.
>  * Dump the RMP entry on RMP violation for the debug.
>  * Shorten the GHCB macro names.
>  * Start the SNP_INIT command id from 255 to give some gap for the legacy SEV.
>  * Sync the header with the latest 0.9 SNP spec.
>
> Changes since v1:
>  * Add AP reset MSR protocol VMGEXIT NAE.
>  * Add Hypervisor features VMGEXIT NAE.
>  * Move the RMP table initialization and RMPUPDATE/PSMASH helper in
>    arch/x86/kernel/sev.c.
>  * Add support to map/unmap SEV legacy command buffer to firmware state when
>    SNP is active.
>  * Enhance PSP driver to provide helper to allocate/free memory used for the
>    firmware context page.
>  * Add support to handle RMP fault for the kernel address.
>  * Add support to handle GUEST_REQUEST NAE event for attestation.
>  * Rename RMP table lookup helper.
>  * Drop typedef from rmpentry struct definition.
>  * Drop SNP static key and use cpu_feature_enabled() to check whether SEV-SNP
>    is active.
>  * Multiple cleanup/fixes to address Boris review feedback.
>
> Brijesh Singh (40):
>   x86/cpufeatures: Add SEV-SNP CPU feature
>   iommu/amd: Introduce function to check SEV-SNP support
>   x86/sev: Add the host SEV-SNP initialization support
>   x86/sev: Add RMP entry lookup helpers
>   x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction
>   x86/sev: Invalid pages from direct map when adding it to RMP table
>   x86/traps: Define RMP violation #PF error code
>   x86/fault: Add support to handle the RMP fault for user address
>   x86/fault: Add support to dump RMP entry on fault
>   crypto: ccp: shutdown SEV firmware on kexec
>   crypto:ccp: Define the SEV-SNP commands
>   crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP
>   crypto:ccp: Provide APIs to issue SEV-SNP commands
>   crypto: ccp: Handle the legacy TMR allocation when SNP is enabled
>   crypto: ccp: Handle the legacy SEV command when SNP is enabled
>   crypto: ccp: Add the SNP_PLATFORM_STATUS command
>   crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command
>   crypto: ccp: Provide APIs to query extended attestation report
>   KVM: SVM: Provide the Hypervisor Feature support VMGEXIT
>   KVM: SVM: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
>   KVM: SVM: Add initial SEV-SNP support
>   KVM: SVM: Add KVM_SNP_INIT command
>   KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command
>   KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command
>   KVM: SVM: Mark the private vma unmerable for SEV-SNP guests
>   KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command
>   KVM: X86: Keep the NPT and RMP page level in sync
>   KVM: x86: Introduce kvm_mmu_get_tdp_walk() for SEV-SNP use
>   KVM: x86: Define RMP page fault error bits for #NPF
>   KVM: x86: Update page-fault trace to log full 64-bit error code
>   KVM: SVM: Do not use long-lived GHCB map while setting scratch area
>   KVM: SVM: Remove the long-lived GHCB host map
>   KVM: SVM: Add support to handle GHCB GPA register VMGEXIT
>   KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT
>   KVM: SVM: Add support to handle Page State Change VMGEXIT
>   KVM: SVM: Introduce ops for the post gfn map and unmap
>   KVM: x86: Export the kvm_zap_gfn_range() for the SNP use
>   KVM: SVM: Add support to handle the RMP nested page fault
>   KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event
>   KVM: SVM: Add module parameter to enable the SEV-SNP
>
> Sean Christopherson (2):
>   KVM: x86/mmu: Move 'pfn' variable to caller of direct_page_fault()
>   KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by TDX and SNP
>
> Tom Lendacky (3):
>   KVM: SVM: Add support to handle AP reset MSR protocol
>   KVM: SVM: Use a VMSA physical address variable for populating VMCB
>   KVM: SVM: Support SEV-SNP AP Creation NAE event
>
>  Documentation/virt/coco/sevguest.rst          |   55 +
>  .../virt/kvm/amd-memory-encryption.rst        |  102 +
>  arch/x86/include/asm/cpufeatures.h            |    1 +
>  arch/x86/include/asm/disabled-features.h      |    8 +-
>  arch/x86/include/asm/kvm-x86-ops.h            |    5 +
>  arch/x86/include/asm/kvm_host.h               |   20 +
>  arch/x86/include/asm/msr-index.h              |    6 +
>  arch/x86/include/asm/sev-common.h             |   28 +
>  arch/x86/include/asm/sev.h                    |   45 +
>  arch/x86/include/asm/svm.h                    |    7 +
>  arch/x86/include/asm/trap_pf.h                |   18 +-
>  arch/x86/kernel/cpu/amd.c                     |    3 +-
>  arch/x86/kernel/sev.c                         |  361 ++++
>  arch/x86/kvm/lapic.c                          |    5 +-
>  arch/x86/kvm/mmu.h                            |    7 +-
>  arch/x86/kvm/mmu/mmu.c                        |   84 +-
>  arch/x86/kvm/svm/sev.c                        | 1676 ++++++++++++++++-
>  arch/x86/kvm/svm/svm.c                        |   62 +-
>  arch/x86/kvm/svm/svm.h                        |   74 +-
>  arch/x86/kvm/trace.h                          |   40 +-
>  arch/x86/kvm/x86.c                            |   92 +-
>  arch/x86/mm/fault.c                           |   84 +-
>  drivers/crypto/ccp/sev-dev.c                  |  924 ++++++++-
>  drivers/crypto/ccp/sev-dev.h                  |   17 +
>  drivers/crypto/ccp/sp-pci.c                   |   12 +
>  drivers/iommu/amd/init.c                      |   30 +
>  include/linux/iommu.h                         |    9 +
>  include/linux/mm.h                            |    6 +-
>  include/linux/psp-sev.h                       |  346 ++++
>  include/linux/sev.h                           |   32 +
>  include/uapi/linux/kvm.h                      |   56 +
>  include/uapi/linux/psp-sev.h                  |   60 +
>  mm/memory.c                                   |   13 +
>  tools/arch/x86/include/asm/cpufeatures.h      |    1 +
>  34 files changed, 4088 insertions(+), 201 deletions(-)
>  create mode 100644 include/linux/sev.h
>
> --
> 2.17.1
>




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux