Hi Robin, Thank you for the reply! On Mon, Dec 20, 2021 at 06:42:26PM +0000, Robin Murphy wrote: > On 2021-11-19 07:19, Nicolin Chen wrote: > > From: Nate Watterson <nwatterson@xxxxxxxxxx> > > > > NVIDIA's Grace Soc has a CMDQ-Virtualization (CMDQV) hardware, > > which extends the standard ARM SMMU v3 IP to support multiple > > VCMDQs with virtualization capabilities. In-kernel of host OS, > > they're used to reduce contention on a single queue. In terms > > of command queue, they are very like the standard CMDQ/ECMDQs, > > but only support CS_NONE in the CS field of CMD_SYNC command. > > > > This patch adds a new nvidia-grace-cmdqv file and inserts its > > structure pointer into the existing arm_smmu_device, and then > > adds related function calls in the arm-smmu-v3 driver. > > > > In the CMDQV driver itself, this patch only adds minimal part > > for host kernel support. Upon probe(), VINTF0 is reserved for > > in-kernel use. And some of the VCMDQs are assigned to VINTF0. > > Then the driver will select one of VCMDQs in the VINTF0 based > > on the CPU currently executing, to issue commands. > > Is there a tangible difference to DMA API or VFIO performance? Our testing environment is currently running on a single-core CPU, so unfortunately we don't have a perf data at this point. > [...] > > +struct arm_smmu_cmdq *nvidia_grace_cmdqv_get_cmdq(struct arm_smmu_device *smmu) > > +{ > > + struct nvidia_grace_cmdqv *cmdqv = smmu->nvidia_grace_cmdqv; > > + struct nvidia_grace_cmdqv_vintf *vintf0 = &cmdqv->vintf0; > > + u16 qidx; > > + > > + /* Check error status of vintf0 */ > > + if (!FIELD_GET(VINTF_STATUS, vintf0->status)) > > + return &smmu->cmdq; > > + > > + /* > > + * Select a vcmdq to use. Here we use a temporal solution to > > + * balance out traffic on cmdq issuing: each cmdq has its own > > + * lock, if all cpus issue cmdlist using the same cmdq, only > > + * one CPU at a time can enter the process, while the others > > + * will be spinning at the same lock. > > + */ > > + qidx = smp_processor_id() % cmdqv->num_vcmdqs_per_vintf; > > How does ordering work between queues? Do they follow a global order > such that a sync on any queue is guaranteed to complete all prior > commands on all queues? CMDQV internal scheduler would insert a SYNC when (for example) switching from VCMDQ0 to VCMDQ1 while last command in VCMDQ0 is not SYNC. HW has a configuration bit in the register to disable this feature, which is by default enabled. > The challenge to make ECMDQ useful to Linux is how to make sure that all > the commands expected to be within scope of a future CMND_SYNC plus that > sync itself all get issued on the same queue, so I'd be mildly surprised > if you didn't have the same problem. PATCH-3 in this series actually helps align the command queues, between issued commands and SYNC, if bool sync == true. Yet, if doing something like issue->issue->issue_with_sync, it could be tricker. Thanks Nic