On Wed, Jun 12, 2024 at 02:45:32PM -0700, Nicolin Chen wrote: > From: Nate Watterson <nwatterson@xxxxxxxxxx> > > NVIDIA's Tegra241 Soc has a CMDQ-Virtualization (CMDQV) hardware, extending > the standard ARM SMMU v3 IP to support multiple VCMDQs with virtualization > capabilities. In terms of command queue, they are very like a standard SMMU > CMDQ (or ECMDQs), but only support CS_NONE in the CS field of CMD_SYNC. > > Add a new tegra241-cmdqv driver, and insert its structure pointer into the > existing arm_smmu_device, and then add related function calls in the SMMUv3 > driver to interact with the CMDQV driver. > > In the CMDQV driver, add a minimal part for the in-kernel support: reserve > VINTF0 for in-kernel use, and assign some of the VCMDQs to the VINTF0, and > select one VCMDQ based on the current CPU ID to execute supported commands. > This multi-queue design for in-kernel use gives some limited improvements: > up to 20% reduction of invalidation time was measured by a multi-threaded > DMA unmap benchmark, compared to a single queue. > > The other part of the CMDQV driver will be user-space support that gives a > hypervisor running on the host OS to talk to the driver for virtualization > use cases, allowing VMs to use VCMDQs without trappings, i.e. no VM Exits. > This is designed based on IOMMUFD, and its RFC series is also under review. > It will provide a guest OS a bigger improvement: 70% to 90% reductions of > TLB invalidation time were measured by DMA unmap tests running in a guest, > compared to nested SMMU CMDQ (with trappings). > > However, it is very important for this in-kernel support to get merged and > installed to VMs running on Grace-powered servers as soon as possible. So, > later those servers would only need to upgrade their host kernels for the > user-space support. ^^^ This is a weird paragraph to put in the commit message. > > As the initial version, the CMDQV driver only supports ACPI configurations. > > Signed-off-by: Nate Watterson <nwatterson@xxxxxxxxxx> > Reviewed-by: Jason Gunthorpe <jgg@xxxxxxxxxx> > Co-developed-by: Nicolin Chen <nicolinc@xxxxxxxxxx> > Signed-off-by: Nicolin Chen <nicolinc@xxxxxxxxxx> > --- > MAINTAINERS | 1 + > drivers/iommu/Kconfig | 11 + > drivers/iommu/arm/arm-smmu-v3/Makefile | 1 + > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 52 +- > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 50 ++ > .../iommu/arm/arm-smmu-v3/tegra241-cmdqv.c | 842 ++++++++++++++++++ > 6 files changed, 945 insertions(+), 12 deletions(-) > create mode 100644 drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c > > diff --git a/MAINTAINERS b/MAINTAINERS > index aacccb376c28..ecf7af1b2df8 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -22078,6 +22078,7 @@ M: Thierry Reding <thierry.reding@xxxxxxxxx> > R: Krishna Reddy <vdumpa@xxxxxxxxxx> > L: linux-tegra@xxxxxxxxxxxxxxx > S: Supported > +F: drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c > F: drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c > F: drivers/iommu/tegra* > > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig > index c04584be3089..e009387d3cba 100644 > --- a/drivers/iommu/Kconfig > +++ b/drivers/iommu/Kconfig > @@ -423,6 +423,17 @@ config ARM_SMMU_V3_KUNIT_TEST > Enable this option to unit-test arm-smmu-v3 driver functions. > > If unsure, say N. > + > +config TEGRA241_CMDQV > + bool "NVIDIA Tegra241 CMDQ-V extension support for ARM SMMUv3" > + depends on ACPI > + help > + Support for NVIDIA CMDQ-Virtualization extension for ARM SMMUv3. The > + CMDQ-V extension is similar to v3.3 ECMDQ for multi command queues > + support, except with virtualization capabilities. > + > + Say Y here if your system is NVIDIA Tegra241 (Grace) or it has the same > + CMDQ-V extension. > endif > > config S390_IOMMU > diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile b/drivers/iommu/arm/arm-smmu-v3/Makefile > index 014a997753a8..55201fdd7007 100644 > --- a/drivers/iommu/arm/arm-smmu-v3/Makefile > +++ b/drivers/iommu/arm/arm-smmu-v3/Makefile > @@ -2,6 +2,7 @@ > obj-$(CONFIG_ARM_SMMU_V3) += arm_smmu_v3.o > arm_smmu_v3-objs-y += arm-smmu-v3.o > arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o > +arm_smmu_v3-objs-$(CONFIG_TEGRA241_CMDQV) += tegra241-cmdqv.o > arm_smmu_v3-objs := $(arm_smmu_v3-objs-y) > > obj-$(CONFIG_ARM_SMMU_V3_KUNIT_TEST) += arm-smmu-v3-test.o > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > index ba0e24d5ffbf..430e84fe3679 100644 > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > @@ -334,6 +334,9 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent) > > static struct arm_smmu_cmdq *arm_smmu_get_cmdq(struct arm_smmu_device *smmu) > { > + if (arm_smmu_has_tegra241_cmdqv(smmu)) > + return tegra241_cmdqv_get_cmdq(smmu); > + > return &smmu->cmdq; Hardcoding all these tegra-specific checks in the core driver is pretty horrible :/ Instead, please can we do something similar to the SMMUv2 driver? That is, tweak the probe routine to call something akin to the arm_smmu_impl_init() function, which looks at the 'model' field pulled out of the IORT and can then dispatch directly to a tegra-specific init function (see, e.g. nvidia_smmu_impl_init() for SMMUv2).