Re: [PATCH v9 3/5] iommu/arm-smmu: introduction of ACTLR for custom prefetcher settings

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 6/6/2024 3:43 AM, Rob Clark wrote:
On Wed, Jun 5, 2024 at 3:52 AM Bibek Kumar Patro
<quic_bibekkum@xxxxxxxxxxx> wrote:

On 6/5/2024 12:19 AM, Rob Clark wrote:
On Thu, May 30, 2024 at 2:22 AM Bibek Kumar Patro
<quic_bibekkum@xxxxxxxxxxx> wrote:



On 5/28/2024 9:38 PM, Rob Clark wrote:
On Tue, May 28, 2024 at 6:06 AM Dmitry Baryshkov
<dmitry.baryshkov@xxxxxxxxxx> wrote:

On Tue, May 28, 2024 at 02:59:51PM +0200, Konrad Dybcio wrote:


On 5/15/24 15:59, Bibek Kumar Patro wrote:


On 5/10/2024 6:32 PM, Konrad Dybcio wrote:
On 10.05.2024 2:52 PM, Bibek Kumar Patro wrote:


On 5/1/2024 12:30 AM, Rob Clark wrote:
On Tue, Jan 23, 2024 at 7:00 AM Bibek Kumar Patro
<quic_bibekkum@xxxxxxxxxxx> wrote:

Currently in Qualcomm  SoCs the default prefetch is set to 1 which allows
the TLB to fetch just the next page table. MMU-500 features ACTLR
register which is implementation defined and is used for Qualcomm SoCs
to have a custom prefetch setting enabling TLB to prefetch the next set
of page tables accordingly allowing for faster translations.

ACTLR value is unique for each SMR (Stream matching register) and stored
in a pre-populated table. This value is set to the register during
context bank initialisation.

Signed-off-by: Bibek Kumar Patro <quic_bibekkum@xxxxxxxxxxx>
---

[...]

+
+               for_each_cfg_sme(cfg, fwspec, j, idx) {
+                       smr = &smmu->smrs[idx];
+                       if (smr_is_subset(smr, id, mask)) {
+                               arm_smmu_cb_write(smmu, cbndx, ARM_SMMU_CB_ACTLR,
+                                               actlrcfg[i].actlr);

So, this makes ACTLR look like kind of a FIFO.  But I'm looking at
downstream kgsl's PRR thing (which we'll need to implement vulkan
sparse residency), and it appears to be wanting to set BIT(5) in ACTLR
to enable PRR.

             val = KGSL_IOMMU_GET_CTX_REG(ctx, KGSL_IOMMU_CTX_ACTLR);
             val |= FIELD_PREP(KGSL_IOMMU_ACTLR_PRR_ENABLE, 1);
             KGSL_IOMMU_SET_CTX_REG(ctx, KGSL_IOMMU_CTX_ACTLR, val);

Any idea how this works?  And does it need to be done before or after
the ACTLR programming done in this patch?

BR,
-R


Hi Rob,

Can you please help provide some more clarification on the FIFO part? By FIFO are you referring to the storing of ACTLR data in the table?

Thanks for pointing to the downstream implementation of kgsl driver for
the PRR bit. Since kgsl driver is already handling this PRR bit's
setting, this makes setting the PRR BIT(5) by SMMU driver redundant.

The kgsl driver is not present upstream.


Right kgsl is not present upstream, it would be better to avoid configuring the PRR bit and can be handled by kgsl directly in downstream.

No! Upstream is not a dumping ground to reduce your technical debt.

There is no kgsl driver upstream, so this ought to be handled here, in
the iommu driver (as poking at hardware A from driver B is usually not good
practice).

I'd second the request here. If another driver has to control the
behaviour of another driver, please add corresponding API for that.

We have adreno_smmu_priv for this purpose ;-)


Thanks Rob for pointing to this private interface structure between smmu
and gpu. I think it's similar to what you're trying to implement here
https://lore.kernel.org/all/CAF6AEGtm-KweFdMFvahH1pWmpOq7dW_p0Xe_13aHGWt0jSbg8w@xxxxxxxxxxxxxx/#t
I can add an api "set_actlr_prr()" with smmu_domain cookie, page pointer
as two parameters. This api then can be used by drm/msm driver to carry
out the prr implementation by simply calling this.
Would this be okay Rob,Konrad,Dmitry?
Let me know if any other suggestions you have in mind as well regarding
parameters and placement.

Hey Bibek, quick question.. is ACTLR preserved across a suspend/resume
cycle?  Or does it need to be reprogrammed on resume?  And same
question for these two PRR related regs:

    /* Global SMMU register offsets */
    #define KGSL_IOMMU_PRR_CFG_LADDR        0x6008
    #define KGSL_IOMMU_PRR_CFG_UADDR        0x600c

(ie. high/low 32b of the PRR page)


Hey Rob, In suspend/resume, the register space power rails are not in
disabled state, so it won't go back to reset values and should retain
it's value. Only in hibernation cycle the registers' value would get reset.

So the hi/low address bit register for PRR page would also retain it's
value along with the ACTLR registers.

I was starting to type up a patch to add PRR configuration, but
depending on whether it interacts with suspend/resume, it might be
better form arm-smmu-qcom.c to just always enable and configure PRR
(including allocating a page to have an address to program into
PRR_CFG_LADDR/UADDR), and instead add an interface to return the PRR
page?  I think there is no harm in unconditionally configuring PRR for
gpu smmu.

Sounds okay though since this would not interact with suspend/resume path.
But I think, suppose in-case this page would have some other references
as well before configuring the address to the registers for PRR
configuration, then  GPU would be dependent on arm-smmu-qcom for this page.
So Instead an endpoint api in arm-smmu-qcom.c can recieve the just the
page-address, and bit set status from drm/msm driver and can set/reset
the bit along with any page-address they want ?
It would mean the interface will be smmu's , but the choice of
configuration data to the registers' will be still with gpu.

I wrote up a small patch with this implementation, would you like to
review that?
Will send it in this v11 series as new patch.

I think if there is no suspend/resume interaction, we should go back
to the original idea of page allocation in drm/msm.

Basically, I think the pros and cons are:

allocate in arm-smmu
   pro: easy to sequence programming with suspend/resume
   con: there isn't a convenient place to free the page on driver unload

allocate in drm/msm:
   pro: easy place to free the page in teardown
   con: harder to sequence with s/r

But if ACTLR and PRR_CFG_LADDR/UADDR are retained, then the con isn't
actually an issue ;-)


Sounds right, also in this case the ownership of the page stays with
drm/msm which might also make it easy to handle the page for them.

Anyways, I can type that patch..  the rest of drm/msm and userspace
changes (vm_bind + sparse) to get to the point where I can use PRR are
a somewhat bigger task so it will take me a while to get the point
where I can test any smmu patches.


Sure Rob get it. Previously in v11 I sent a patch adding a adreno-smmu-priv api with similar "page allocation in drm" design as you explained
above. Is that approach looking okay?
If it's okay can I add you in
suggested-by tag in that patch <https://lore.kernel.org/all/20240605121713.3596499-7-quic_bibekkum@xxxxxxxxxxx/> ?

Thanks & regards,
Bibek


BR,
-R


Thanks & regards,
Bibek


BR,
-R

Thanks & regards,
Bibek

BR,
-R



Thanks for bringing up this point.
I will send v10 patch series removing this BIT(5) setting from the ACTLR
table.

I think it's generally saner to configure the SMMU from the SMMU driver..

Yes, agree on this. But since PRR bit is not directly related to SMMU
configuration so I think it would be better to remove this PRR bit
setting from SMMU driver based on my understanding.

Why is it not related? We still don't know what it does.

Konrad

--
With best wishes
Dmitry




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [Linux for Sparc]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux