On 29/8/24 06:43, Dan Williams wrote:
Alexey Kardashevskiy wrote:
Hi everyone,
Here are some patches to enable SEV-TIO (aka TDISP, aka secure VFIO)
on AMD Turin.
The basic idea is to allow DMA to/from encrypted memory of SNP VMs and
secure MMIO in SNP VMs (i.e. with Cbit set) as well.
These include both guest and host support. QEMU also requires
some patches, links below.
The patches are organized as:
01..06 - preparing the host OS;
07 - new TSM module;
08 - add PSP SEV TIO ABI (IDE should start working at this point);
09..14 - add KVM support (TDI binding, MMIO faulting, etc);
15..19 - guest changes (the rest of SEV TIO ABI, DMA, secure MMIO).
20, 21 - some helpers for guest OS to use encrypted MMIO
This is based on a merge of
ee3248f9f8d6 Lukas Wunner spdm: Allow control of next requester nonce
through sysfs
85ef1ac03941 (AMDESE/snp-host-latest) 4 days ago Michael Roth [TEMP] KVM: guest_memfd: Update gmem_prep are hook to handle partially-allocated folios
Please comment. Thanks.
This cover letter is something I can read after having been in and
around this space for a while, but I wonder how much of it makes sense
to casual reviewers?
Thanks,
SEV TIO tree prototype
======================
[..]
Code
----
Written with AMD SEV SNP in mind, TSM is the PSP and
therefore no much of IDE/TDISP
is left for the host or guest OS.
Add a common module to expose various data objects in
the same way in host and
guest OS.
Provide a know on the host to enable IDE encryption.
Add another version of Guest Request for secure
guest<->PSP communication.
Enable secure DMA by:
- configuring vTOM in a secure DTE via the PSP to cover
the entire guest RAM;
- mapping all private memory pages in IOMMU just like
as they were shared
(requires hacking iommufd);
What kind of hack are we talking about here? An upstream suitable
change, or something that needs quite a bit more work to be done
properly?
Right now it is hacking IOMMUFD to go to the KVM for
private_gfn->host_pfn. As I am being told in this thread, VFIO DMA
map/unmap needs to be taught to accept {memfd, offset}.
I jumped ahead to read Jason's reaction but please do at least provide a
map the controversy in the cover letter, something like "see patch 12 for
details".
Yeah, noticed that, thanks, appreciated!
- skipping various enforcements of non-SME or
SWIOTLB in the guest;
Is this based on some concept of private vs shared mode devices?
No mixed share+private DMA supported within the
same IOMMU.
What does this mean? A device may not have mixed mappings (makes sense),
Currently devices do not have an idea about private host memory (but it
is being worked on afaik).
or an IOMMU can not host devices that do not all agree on whether DMA is
private or shared?
The hardware allows that via hardware-assisted vIOMMU and I/O page
tables in the guest with C-bit takes into accound by the IOMMU but the
software support is missing right now. So for this initial drop, vTOM is
used for DMA - this thing says "everything below <addr> is private,
above <addr> - shared" so nothing needs to bother with the C-bit, and in
my exercise I set the <addr> to the allowed maximum.
So each IOMMUFD instance in the VM is either "all private mappings" or
"all shared". Could be half/half by moving that <addr> :)
Enable secure MMIO by:
- configuring RMP entries via the PSP;
- adding necessary helpers for mapping MMIO with
the Cbit set;
- hacking the KVM #PF handler to allow private
MMIO failts.
Based on the latest upstream KVM (at the
moment it is kvm-coco-queue).
Here is where I lament that kvm-coco-queue is not run like akpm/mm where
it is possible to try out "yesterday's mm". Perhaps this is an area to
collaborate on kvm-coco-queue snapshots to help with testing.
Yeah this more an idea of what it is based on, I normally push a tested
branch somewhere on github, just to eliminate uncertainty.
Workflow
--------
1. Boot host OS.
2. "Connect" the physical device.
3. Bind a VF to VFIO-PCI.
4. Run QEMU _without_ the device yet.
5. Hotplug the VF to the VM.
6. (if not already) Load the device driver.
7. Right after the BusMaster is enabled,
tsm.ko performs secure DMA and MMIO setup.
8. Run tests, for example:
sudo ./pcimem/pcimem
/sys/bus/pci/devices/0000\:01\:00.0/resource4_enc
0 w*4 0xabcd
Assumptions
-----------
This requires hotpligging into the VM vs
passing the device via the command line as
VFIO maps all guest memory as the device init
step which is too soon as
SNP LAUNCH UPDATE happens later and will fail
if VFIO maps private memory before that.
Would the device not just launch in "shared" mode until it is later
converted to private? I am missing the detail of why passing the device
on the command line requires that private memory be mapped early.
A sequencing problem.
QEMU "realizes" a VFIO device, it creates an iommufd instance which
creates a domain and writes to a DTE (a IOMMU descriptor for PCI BDFn).
And DTE is not updated after than. For secure stuff, DTE needs to be
slightly different. So right then I tell IOMMUFD that it will handle
private memory.
Then, the same VFIO "realize" handler maps the guest memory in iommufd.
I use the same flag (well, pointer to kvm) in the iommufd pinning code,
private memory is pinned and mapped (and related page state change
happens as the guest memory is made guest-owned in RMP).
QEMU goes to machine_reset() and calls "SNP LAUNCH UPDATE" (the actual
place changed recenly, huh) and the latter will measure the guest and
try making all guest memory private but it already happened => error.
I think I have to decouple the pinning and the IOMMU/DTE setting.
That said, the implication that private device assignment requires
hotplug events is a useful property. This matches nicely with initial
thoughts that device conversion events are violent and might as well be
unplug/replug events to match all the assumptions around what needs to
be updated.
For the initial drop, I tell QEMU via "-device vfio-pci,x-tio=true" that
it is going to be private so there should be no massive conversion.
This requires the BME hack as MMIO and
Not sure what the "BME hack" is, I guess this is foreshadowing for later
in this story.
>
BusMaster enable bits cannot be 0 after MMIO
validation is done
It would be useful to call out what is a TDISP requirement, vs
device-specific DSM vs host-specific TSM requirement. In this case I
assume you are referring to PCI 6.2 11.2.6 where it notes that TDIs must
Oh there is 6.2 already.
enter the TDISP ERROR state if BME is cleared after the device is
locked?
...but this begs the question of whether it needs to be avoided outright
Well, besides a couple of avoidable places (like testing INTx support
which we know is not going to work on VFs anyway), a standard driver
enables MSE first (and the value for the command register does not have
1 for BME) and only then BME. TBH I do not think writing BME=0 when
BME=0 already is "clearing" but my test device disagrees.
or handled as an error recovery case dependending on policy.
Avoding seems more straight forward unless we actually want enlightened
device drivers which want to examine the interface report before
enabling the device. Not sure.
the guest OS booting process when this
appens.
SVSM could help addressing these (not
implemented at the moment).
At first though avoiding SVSM entanglements where the kernel can be
enlightened shoud be the policy. I would only expect SVSM hacks to cover
for legacy OSes that will never be TDISP enlightened, but in that case
we are likely talking about fully unaware L2. Lets assume fully
enlightened L1 for now.
Well, I could also tweak OVMF to make necessary calls to the PSP and
hack QEMU to postpone the command register updates to get this going,
just a matter of ugliness.
QEMU advertises TEE-IO capability to the VM.
An additional x-tio flag is added to
vfio-pci.
TODOs
-----
Deal with PCI reset. Hot unplug+plug? Power
states too.
Do better generalization, the current code
heavily uses SEV TIO defined
structures in supposedly generic code.
Fix the documentation comments of SEV TIO structures.
Hey, it's a start. I appreciate the "release early" aspect of this
posting.
:)
Thanks,
Git trees
---------
https://github.com/AMDESE/linux-kvm/tree/tio
https://github.com/AMDESE/qemu/tree/tio
[..]
Alexey Kardashevskiy (21):
tsm-report: Rename module to reflect what it does
pci/doe: Define protocol types and make those public
pci: Define TEE-IO bit in PCIe device capabilities
PCI/IDE: Define Integrity and Data Encryption (IDE) extended
capability
crypto/ccp: Make some SEV helpers public
crypto: ccp: Enable SEV-TIO feature in the PSP when supported
pci/tdisp: Introduce tsm module
crypto/ccp: Implement SEV TIO firmware interface
kvm: Export kvm_vm_set_mem_attributes
vfio: Export helper to get vfio_device from fd
KVM: SEV: Add TIO VMGEXIT and bind TDI
KVM: IOMMUFD: MEMFD: Map private pages
KVM: X86: Handle private MMIO as shared
RFC: iommu/iommufd/amd: Add IOMMU_HWPT_TRUSTED flag, tweak DTE's
DomainID, IOTLB
coco/sev-guest: Allow multiple source files in the driver
coco/sev-guest: Make SEV-to-PSP request helpers public
coco/sev-guest: Implement the guest side of things
RFC: pci: Add BUS_NOTIFY_PCI_BUS_MASTER event
sev-guest: Stop changing encrypted page state for TDISP devices
pci: Allow encrypted MMIO mapping via sysfs
pci: Define pci_iomap_range_encrypted
drivers/crypto/ccp/Makefile | 2 +
drivers/pci/Makefile | 1 +
drivers/virt/coco/Makefile | 3 +-
drivers/virt/coco/sev-guest/Makefile | 1 +
arch/x86/include/asm/kvm-x86-ops.h | 2 +
arch/x86/include/asm/kvm_host.h | 2 +
arch/x86/include/asm/sev.h | 23 +
arch/x86/include/uapi/asm/svm.h | 2 +
arch/x86/kvm/svm/svm.h | 2 +
drivers/crypto/ccp/sev-dev-tio.h | 105 ++
drivers/crypto/ccp/sev-dev.h | 4 +
drivers/iommu/amd/amd_iommu_types.h | 2 +
drivers/iommu/iommufd/io_pagetable.h | 3 +
drivers/iommu/iommufd/iommufd_private.h | 4 +
drivers/virt/coco/sev-guest/sev-guest.h | 56 +
include/asm-generic/pci_iomap.h | 4 +
include/linux/device.h | 5 +
include/linux/device/bus.h | 3 +
include/linux/dma-direct.h | 4 +
include/linux/iommufd.h | 6 +
include/linux/kvm_host.h | 70 +
include/linux/pci-doe.h | 4 +
include/linux/pci-ide.h | 18 +
include/linux/pci.h | 2 +-
include/linux/psp-sev.h | 116 +-
include/linux/swiotlb.h | 4 +
include/linux/tsm-report.h | 113 ++
include/linux/tsm.h | 337 +++--
include/linux/vfio.h | 1 +
include/uapi/linux/iommufd.h | 1 +
include/uapi/linux/kvm.h | 29 +
include/uapi/linux/pci_regs.h | 77 +-
include/uapi/linux/psp-sev.h | 4 +-
arch/x86/coco/sev/core.c | 11 +
arch/x86/kvm/mmu/mmu.c | 6 +-
arch/x86/kvm/svm/sev.c | 217 +++
arch/x86/kvm/svm/svm.c | 3 +
arch/x86/kvm/x86.c | 12 +
arch/x86/mm/mem_encrypt.c | 5 +
arch/x86/virt/svm/sev.c | 23 +-
drivers/crypto/ccp/sev-dev-tio.c | 1565 ++++++++++++++++++++
drivers/crypto/ccp/sev-dev-tsm.c | 397 +++++
drivers/crypto/ccp/sev-dev.c | 87 +-
drivers/iommu/amd/iommu.c | 20 +-
drivers/iommu/iommufd/hw_pagetable.c | 4 +
drivers/iommu/iommufd/io_pagetable.c | 2 +
drivers/iommu/iommufd/main.c | 21 +
drivers/iommu/iommufd/pages.c | 94 +-
drivers/pci/doe.c | 2 -
drivers/pci/ide.c | 186 +++
drivers/pci/iomap.c | 24 +
drivers/pci/mmap.c | 11 +-
drivers/pci/pci-sysfs.c | 27 +-
drivers/pci/pci.c | 3 +
drivers/pci/proc.c | 2 +-
drivers/vfio/vfio_main.c | 13 +
drivers/virt/coco/sev-guest/{sev-guest.c => sev_guest.c} | 68 +-
drivers/virt/coco/sev-guest/sev_guest_tio.c | 513 +++++++
drivers/virt/coco/tdx-guest/tdx-guest.c | 8 +-
drivers/virt/coco/tsm-report.c | 512 +++++++
drivers/virt/coco/tsm.c | 1542 ++++++++++++++-----
virt/kvm/guest_memfd.c | 40 +
virt/kvm/kvm_main.c | 4 +-
virt/kvm/vfio.c | 197 ++-
Documentation/virt/coco/tsm.rst | 62 +
MAINTAINERS | 4 +-
arch/x86/kvm/Kconfig | 1 +
drivers/pci/Kconfig | 4 +
drivers/virt/coco/Kconfig | 11 +
69 files changed, 6163 insertions(+), 548 deletions(-)
create mode 100644 drivers/crypto/ccp/sev-dev-tio.h
create mode 100644 drivers/virt/coco/sev-guest/sev-guest.h
create mode 100644 include/linux/pci-ide.h
create mode 100644 include/linux/tsm-report.h
create mode 100644 drivers/crypto/ccp/sev-dev-tio.c
create mode 100644 drivers/crypto/ccp/sev-dev-tsm.c
create mode 100644 drivers/pci/ide.c
rename drivers/virt/coco/sev-guest/{sev-guest.c => sev_guest.c} (96%)
create mode 100644 drivers/virt/coco/sev-guest/sev_guest_tio.c
create mode 100644 drivers/virt/coco/tsm-report.c
create mode 100644 Documentation/virt/coco/tsm.rst
--
2.45.2
--
Alexey