On Wed, Jun 5, 2024 at 3:52 AM Bibek Kumar Patro <quic_bibekkum@xxxxxxxxxxx> wrote: > > On 6/5/2024 12:19 AM, Rob Clark wrote: > > On Thu, May 30, 2024 at 2:22 AM Bibek Kumar Patro > > <quic_bibekkum@xxxxxxxxxxx> wrote: > >> > >> > >> > >> On 5/28/2024 9:38 PM, Rob Clark wrote: > >>> On Tue, May 28, 2024 at 6:06 AM Dmitry Baryshkov > >>> <dmitry.baryshkov@xxxxxxxxxx> wrote: > >>>> > >>>> On Tue, May 28, 2024 at 02:59:51PM +0200, Konrad Dybcio wrote: > >>>>> > >>>>> > >>>>> On 5/15/24 15:59, Bibek Kumar Patro wrote: > >>>>>> > >>>>>> > >>>>>> On 5/10/2024 6:32 PM, Konrad Dybcio wrote: > >>>>>>> On 10.05.2024 2:52 PM, Bibek Kumar Patro wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> On 5/1/2024 12:30 AM, Rob Clark wrote: > >>>>>>>>> On Tue, Jan 23, 2024 at 7:00 AM Bibek Kumar Patro > >>>>>>>>> <quic_bibekkum@xxxxxxxxxxx> wrote: > >>>>>>>>>> > >>>>>>>>>> Currently in Qualcomm SoCs the default prefetch is set to 1 which allows > >>>>>>>>>> the TLB to fetch just the next page table. MMU-500 features ACTLR > >>>>>>>>>> register which is implementation defined and is used for Qualcomm SoCs > >>>>>>>>>> to have a custom prefetch setting enabling TLB to prefetch the next set > >>>>>>>>>> of page tables accordingly allowing for faster translations. > >>>>>>>>>> > >>>>>>>>>> ACTLR value is unique for each SMR (Stream matching register) and stored > >>>>>>>>>> in a pre-populated table. This value is set to the register during > >>>>>>>>>> context bank initialisation. > >>>>>>>>>> > >>>>>>>>>> Signed-off-by: Bibek Kumar Patro <quic_bibekkum@xxxxxxxxxxx> > >>>>>>>>>> --- > >>>>>>> > >>>>>>> [...] > >>>>>>> > >>>>>>>>>> + > >>>>>>>>>> + for_each_cfg_sme(cfg, fwspec, j, idx) { > >>>>>>>>>> + smr = &smmu->smrs[idx]; > >>>>>>>>>> + if (smr_is_subset(smr, id, mask)) { > >>>>>>>>>> + arm_smmu_cb_write(smmu, cbndx, ARM_SMMU_CB_ACTLR, > >>>>>>>>>> + actlrcfg[i].actlr); > >>>>>>>>> > >>>>>>>>> So, this makes ACTLR look like kind of a FIFO. But I'm looking at > >>>>>>>>> downstream kgsl's PRR thing (which we'll need to implement vulkan > >>>>>>>>> sparse residency), and it appears to be wanting to set BIT(5) in ACTLR > >>>>>>>>> to enable PRR. > >>>>>>>>> > >>>>>>>>> val = KGSL_IOMMU_GET_CTX_REG(ctx, KGSL_IOMMU_CTX_ACTLR); > >>>>>>>>> val |= FIELD_PREP(KGSL_IOMMU_ACTLR_PRR_ENABLE, 1); > >>>>>>>>> KGSL_IOMMU_SET_CTX_REG(ctx, KGSL_IOMMU_CTX_ACTLR, val); > >>>>>>>>> > >>>>>>>>> Any idea how this works? And does it need to be done before or after > >>>>>>>>> the ACTLR programming done in this patch? > >>>>>>>>> > >>>>>>>>> BR, > >>>>>>>>> -R > >>>>>>>>> > >>>>>>>> > >>>>>>>> Hi Rob, > >>>>>>>> > >>>>>>>> Can you please help provide some more clarification on the FIFO part? By FIFO are you referring to the storing of ACTLR data in the table? > >>>>>>>> > >>>>>>>> Thanks for pointing to the downstream implementation of kgsl driver for > >>>>>>>> the PRR bit. Since kgsl driver is already handling this PRR bit's > >>>>>>>> setting, this makes setting the PRR BIT(5) by SMMU driver redundant. > >>>>>>> > >>>>>>> The kgsl driver is not present upstream. > >>>>>>> > >>>>>> > >>>>>> Right kgsl is not present upstream, it would be better to avoid configuring the PRR bit and can be handled by kgsl directly in downstream. > >>>>> > >>>>> No! Upstream is not a dumping ground to reduce your technical debt. > >>>>> > >>>>> There is no kgsl driver upstream, so this ought to be handled here, in > >>>>> the iommu driver (as poking at hardware A from driver B is usually not good > >>>>> practice). > >>>> > >>>> I'd second the request here. If another driver has to control the > >>>> behaviour of another driver, please add corresponding API for that. > >>> > >>> We have adreno_smmu_priv for this purpose ;-) > >>> > >> > >> Thanks Rob for pointing to this private interface structure between smmu > >> and gpu. I think it's similar to what you're trying to implement here > >> https://lore.kernel.org/all/CAF6AEGtm-KweFdMFvahH1pWmpOq7dW_p0Xe_13aHGWt0jSbg8w@xxxxxxxxxxxxxx/#t > >> I can add an api "set_actlr_prr()" with smmu_domain cookie, page pointer > >> as two parameters. This api then can be used by drm/msm driver to carry > >> out the prr implementation by simply calling this. > >> Would this be okay Rob,Konrad,Dmitry? > >> Let me know if any other suggestions you have in mind as well regarding > >> parameters and placement. > > > > Hey Bibek, quick question.. is ACTLR preserved across a suspend/resume > > cycle? Or does it need to be reprogrammed on resume? And same > > question for these two PRR related regs: > > > > /* Global SMMU register offsets */ > > #define KGSL_IOMMU_PRR_CFG_LADDR 0x6008 > > #define KGSL_IOMMU_PRR_CFG_UADDR 0x600c > > > > (ie. high/low 32b of the PRR page) > > > > Hey Rob, In suspend/resume, the register space power rails are not in > disabled state, so it won't go back to reset values and should retain > it's value. Only in hibernation cycle the registers' value would get reset. > > So the hi/low address bit register for PRR page would also retain it's > value along with the ACTLR registers. > > > I was starting to type up a patch to add PRR configuration, but > > depending on whether it interacts with suspend/resume, it might be > > better form arm-smmu-qcom.c to just always enable and configure PRR > > (including allocating a page to have an address to program into > > PRR_CFG_LADDR/UADDR), and instead add an interface to return the PRR > > page? I think there is no harm in unconditionally configuring PRR for > > gpu smmu. > > Sounds okay though since this would not interact with suspend/resume path. > But I think, suppose in-case this page would have some other references > as well before configuring the address to the registers for PRR > configuration, then GPU would be dependent on arm-smmu-qcom for this page. > So Instead an endpoint api in arm-smmu-qcom.c can recieve the just the > page-address, and bit set status from drm/msm driver and can set/reset > the bit along with any page-address they want ? > It would mean the interface will be smmu's , but the choice of > configuration data to the registers' will be still with gpu. > > I wrote up a small patch with this implementation, would you like to > review that? > Will send it in this v11 series as new patch. I think if there is no suspend/resume interaction, we should go back to the original idea of page allocation in drm/msm. Basically, I think the pros and cons are: allocate in arm-smmu pro: easy to sequence programming with suspend/resume con: there isn't a convenient place to free the page on driver unload allocate in drm/msm: pro: easy place to free the page in teardown con: harder to sequence with s/r But if ACTLR and PRR_CFG_LADDR/UADDR are retained, then the con isn't actually an issue ;-) Anyways, I can type that patch.. the rest of drm/msm and userspace changes (vm_bind + sparse) to get to the point where I can use PRR are a somewhat bigger task so it will take me a while to get the point where I can test any smmu patches. BR, -R > Thanks & regards, > Bibek > > > > > BR, > > -R > > > >> Thanks & regards, > >> Bibek > >> > >>> BR, > >>> -R > >>> > >>>>> > >>>>>> > >>>>>>>> Thanks for bringing up this point. > >>>>>>>> I will send v10 patch series removing this BIT(5) setting from the ACTLR > >>>>>>>> table. > >>>>>>> > >>>>>>> I think it's generally saner to configure the SMMU from the SMMU driver.. > >>>>>> > >>>>>> Yes, agree on this. But since PRR bit is not directly related to SMMU > >>>>>> configuration so I think it would be better to remove this PRR bit > >>>>>> setting from SMMU driver based on my understanding. > >>>>> > >>>>> Why is it not related? We still don't know what it does. > >>>>> > >>>>> Konrad > >>>> > >>>> -- > >>>> With best wishes > >>>> Dmitry