On 30/06/2020 15:53, Robin Murphy wrote: > On 2020-06-30 09:19, Jon Hunter wrote: >> >> On 30/06/2020 01:10, Krishna Reddy wrote: >>> NVIDIA's Tegra194 SoC uses two ARM MMU-500s together to interleave >>> IOVA accesses across them. >>> Add NVIDIA implementation for dual ARM MMU-500s and add new compatible >>> string for Tegra194 SoC SMMU topology. >> >> There is no description here of the 3rd SMMU that you mention below. >> I think that we should describe the full picture here. >> >>> Signed-off-by: Krishna Reddy <vdumpa@xxxxxxxxxx> ... >>> +static void nvidia_smmu_tlb_sync(struct arm_smmu_device *smmu, int >>> page, >>> + int sync, int status) >>> +{ >>> + unsigned int delay; >>> + >>> + arm_smmu_writel(smmu, page, sync, 0); >>> + >>> + for (delay = 1; delay < TLB_LOOP_TIMEOUT_IN_US; delay *= 2) { >> >> So we are doubling the delay every time? Is this better than just using >> the same on each loop? > > This is the same logic as the main driver (see 8513c8930069) - the sync > is expected to complete relatively quickly, hence why we have the inner > spin loop to avoid the delay entirely in the typical case, and the > longer it's taking, the more likely it is that something's wrong and it > will never complete anyway. Realistically, a heavily loaded SMMU at a > modest clock rate might take us through a couple of iterations of the > outer loop, but beyond that we're pretty much just killing time until we > declare it wedged and give up, and by then there's not much point in > burning power frantically hamering on the interconnect. Ah OK. Then maybe we should move the definitions for TLB_LOOP_TIMEOUT and TLB_SPIN_COUNT into the arm-smmu.h so that we can use them directly in this file instead of redefining them. Then it maybe clear that these are part of the main driver. >>> +struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device >>> *smmu) >>> +{ >>> + unsigned int i; >>> + struct nvidia_smmu *nvidia_smmu; >>> + struct platform_device *pdev = to_platform_device(smmu->dev); >>> + >>> + nvidia_smmu = devm_kzalloc(smmu->dev, sizeof(*nvidia_smmu), >>> GFP_KERNEL); >>> + if (!nvidia_smmu) >>> + return ERR_PTR(-ENOMEM); >>> + >>> + nvidia_smmu->smmu = *smmu; >>> + /* Instance 0 is ioremapped by arm-smmu.c after this function >>> returns */ >>> + nvidia_smmu->num_inst = 1; >>> + >>> + for (i = 1; i < MAX_SMMU_INSTANCES; i++) { >>> + struct resource *res; >>> + >>> + res = platform_get_resource(pdev, IORESOURCE_MEM, i); >>> + if (!res) >>> + break; >>> + >>> + nvidia_smmu->bases[i] = devm_ioremap_resource(smmu->dev, res); >>> + if (IS_ERR(nvidia_smmu->bases[i])) >>> + return ERR_CAST(nvidia_smmu->bases[i]); >>> + >>> + nvidia_smmu->num_inst++; >>> + } >>> + >>> + nvidia_smmu->smmu.impl = &nvidia_smmu_impl; >>> + /* >>> + * Free the arm_smmu_device struct allocated in arm-smmu.c. >>> + * Once this function returns, arm-smmu.c would use arm_smmu_device >>> + * allocated as part of nvidia_smmu struct. >>> + */ >>> + devm_kfree(smmu->dev, smmu); >> >> Why don't we just store the pointer of the smmu struct passed to this >> function >> in the nvidia_smmu struct and then we do not need to free this here. >> In other >> words make ... >> >> struct nvidia_smmu { >> struct arm_smmu_device *smmu; >> unsigned int num_inst; >> void __iomem *bases[MAX_SMMU_INSTANCES]; >> }; >> >> This seems more appropriate, than copying the struct and freeing memory >> allocated else-where. > > But then how do you get back to struct nvidia_smmu given just a pointer > to struct arm_smmu_device? Ah yes of course that is what I was missing. I wondered what was going on here. So I think we should add a nice comment in the above function of why we are copying this and cannot simply store the pointer. Cheers Jon -- nvpublic