On Wed, Feb 19, 2025 at 09:35:16AM -0400, Jason Gunthorpe wrote: > On Wed, Feb 19, 2025 at 11:43:46AM +1100, Alexey Kardashevskiy wrote: > > On 19/2/25 10:51, Jason Gunthorpe wrote: > > > On Wed, Feb 19, 2025 at 10:35:28AM +1100, Alexey Kardashevskiy wrote: > > > > > > > With in-place conversion, we could map the entire guest once in the HV IOMMU > > > > and control the Cbit via the guest's IOMMU table (when available). Thanks, > > > > > > Isn't it more complicated than that? I understood you need to have a > > > IOPTE boundary in the hypervisor at any point where the guest Cbit > > > changes - so you can't just dump 1G hypervisor pages to cover the > > > whole VM, you have to actively resize ioptes? > > > > When the guest Cbit changes, only AMD RMP table requires update but not > > necessaryly NPT or IOPTEs. > > (I may have misunderstood the question, what meaning does "dump 1G pages" > > have?). > > AFAIK that is not true, if there are mismatches in page size, ie the > RMP is 2M and the IOPTE is 1G then things do not work properly. Just for clarity: at least for normal/nested page table (but I'm assuming the same applies to IOMMU mappings), 1G mappings are handled similarly as 2MB mappings as far as RMP table checks are concerned: each 2MB range is checked individually as if it were a separate 2MB mapping: AMD Architecture Programmer's Manual Volume 2, 15.36.10, "RMP and VMPL Access Checks": "Accesses to 1GB pages only install 2MB TLB entries when SEV-SNP is enabled, therefore this check treats 1GB accesses as 2MB accesses for purposes of this check." So a 1GB mapping doesn't really impose more restrictions than a 2MB mapping (unless there's something different about how RMP checks are done for IOMMU). But the point still stands for 4K RMP entries and 2MB mappings: a 2MB mapping either requires private page RMP entries to be 2MB, or in the case of 2MB mapping of shared pages, every page in the range must be shared according to the corresponding RMP entries. > > It is why we had to do this: I think, for the non-SEV-TIO use-case, it had more to do with inability to unmap a 4K range once a particular 4K page has been converted from shared to private if it was originally installed via a 2MB IOPTE, since the guest could actively be DMA'ing to other shared pages in the 2M range (but we can be assured it is not DMA'ing to a particular 4K page it has converted to private), and the IOMMU doesn't (AFAIK) have a way to atomically split an existing 2MB IOPTE to avoid this. So forcing everything to 4K ends up being necessary since we don't know in advance what ranges might contain 4K pages that will get converted to private in the future by the guest. SEV-TIO might relax this restriction by making use of TMPM and the PSMASH_IO command to split/"smash" RMP entries and IOMMU mappings to 4K after-the-fact, but I'm not too familiar with the architecture/plans so Alexey can correct me on that. -Mike > > > > This was the whole motivation to adding the page size override kernel > > > command line. > > commit f0295913c4b4f377c454e06f50c1a04f2f80d9df > Author: Joerg Roedel <jroedel@xxxxxxx> > Date: Thu Sep 5 09:22:40 2024 +0200 > > iommu/amd: Add kernel parameters to limit V1 page-sizes > > Add two new kernel command line parameters to limit the page-sizes > used for v1 page-tables: > > nohugepages - Limits page-sizes to 4KiB > > v2_pgsizes_only - Limits page-sizes to 4Kib/2Mib/1GiB; The > same as the sizes used with v2 page-tables > > This is needed for multiple scenarios. When assigning devices to > SEV-SNP guests the IOMMU page-sizes need to match the sizes in the RMP > table, otherwise the device will not be able to access all shared > memory. > > Also, some ATS devices do not work properly with arbitrary IO > page-sizes as supported by AMD-Vi, so limiting the sizes used by the > driver is a suitable workaround. > > All-in-all, these parameters are only workarounds until the IOMMU core > and related APIs gather the ability to negotiate the page-sizes in a > better way. > > Signed-off-by: Joerg Roedel <jroedel@xxxxxxx> > Reviewed-by: Vasant Hegde <vasant.hegde@xxxxxxx> > Link: https://lore.kernel.org/r/20240905072240.253313-1-joro@xxxxxxxxxx > > Jason