On Thu, 21 Dec 2023 at 12:02, Bibek Kumar Patro <quic_bibekkum@xxxxxxxxxxx> wrote: > > > > On 12/21/2023 6:06 AM, Dmitry Baryshkov wrote: > > On Wed, 20 Dec 2023 at 15:39, Bibek Kumar Patro > > <quic_bibekkum@xxxxxxxxxxx> wrote: > >> > >> Currently in Qualcomm SoCs the default prefetch is set to 1 which allows > >> the TLB to fetch just the next page table. MMU-500 features ACTLR > >> register which is implementation defined and is used for Qualcomm SoCs > >> to have a custom prefetch setting enabling TLB to prefetch the next set > >> of page tables accordingly allowing for faster translations. > >> > >> ACTLR value is unique for each SMR (Stream matching register) and stored > >> in a pre-populated table. This value is set to the register during > >> context bank initialisation. > >> > >> Signed-off-by: Bibek Kumar Patro <quic_bibekkum@xxxxxxxxxxx> > >> --- > >> drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 45 ++++++++++++++++++++++ > >> drivers/iommu/arm/arm-smmu/arm-smmu-qcom.h | 6 ++- > >> drivers/iommu/arm/arm-smmu/arm-smmu.c | 5 ++- > >> drivers/iommu/arm/arm-smmu/arm-smmu.h | 5 +++ > >> 4 files changed, 58 insertions(+), 3 deletions(-) > >> > >> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c > >> index 20c9836d859b..1cefdd0ca110 100644 > >> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c > >> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c > >> @@ -24,6 +24,12 @@ > >> #define CPRE (1 << 1) > >> #define CMTLB (1 << 0) > >> > >> +struct actlr_config { > >> + u16 sid; > >> + u16 mask; > >> + u32 actlr; > >> +}; > >> + > >> static struct qcom_smmu *to_qcom_smmu(struct arm_smmu_device *smmu) > >> { > >> return container_of(smmu, struct qcom_smmu, smmu); > >> @@ -215,9 +221,38 @@ static bool qcom_adreno_can_do_ttbr1(struct arm_smmu_device *smmu) > >> return true; > >> } > >> > >> +static void qcom_smmu_set_actlr(struct device *dev, struct arm_smmu_device *smmu, int cbndx, > >> + const struct actlr_config *actlrcfg) > >> +{ > >> + struct arm_smmu_master_cfg *cfg = dev_iommu_priv_get(dev); > >> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); > >> + struct arm_smmu_smr *smr; > >> + u16 mask; > >> + int idx; > >> + u16 id; > >> + int i; > >> + > >> + for (; actlrcfg->sid || actlrcfg->mask || actlrcfg->actlr; actlrcfg++) { > >> + id = actlrcfg->sid; > >> + mask = actlrcfg->mask; > >> + > >> + for_each_cfg_sme(cfg, fwspec, i, idx) { > >> + smr = &smmu->smrs[idx]; > >> + if (smr_is_subset(smr, id, mask)) { > >> + arm_smmu_cb_write(smmu, cbndx, ARM_SMMU_CB_ACTLR, > >> + actlrcfg->actlr); > >> + break; > >> + } > >> + } > >> + } > >> +} > >> + > >> static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain, > >> struct io_pgtable_cfg *pgtbl_cfg, struct device *dev) > >> { > >> + struct arm_smmu_device *smmu = smmu_domain->smmu; > >> + struct qcom_smmu *qsmmu = to_qcom_smmu(smmu); > >> + int cbndx = smmu_domain->cfg.cbndx; > >> struct adreno_smmu_priv *priv; > >> > >> smmu_domain->cfg.flush_walk_prefer_tlbiasid = true; > >> @@ -248,6 +283,9 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain, > >> priv->set_stall = qcom_adreno_smmu_set_stall; > >> priv->resume_translation = qcom_adreno_smmu_resume_translation; > >> > >> + if (qsmmu->data->actlrcfg_gfx) > >> + qcom_smmu_set_actlr(dev, smmu, cbndx, qsmmu->data->actlrcfg_gfx); > > > > There was a feedback point against v4 that there can be more than two > > (apps + gpu) SMMU devices. No, we can not use additional compat > > strings, the SMMU units are compatible with each other. > > Just to understand better, did you mean if in the below check > [inside arm-smmu-qcom.c file during qcom_smmu_create()], "else" has two > things? (Currently adreno_impl for gpu smmu, else for only > apps smmu) qcom,adreno-smmu is quite unique here, this is the only distinctive substring. We do not have such compat strings for any other of SMMU nodes. > > if (np && of_device_is_compatible(np, "qcom,adreno-smmu")) > impl = data->adreno_impl; > else > impl = data->impl; > > > Please add > > matching between the smmu and particular actlr table using the IO > > address of the SMMU block. > > > > The ACTLR table for each smmu will have A IO address attached, so based > on IO address we can apply ACTLR. > Is this your proposal((IMO hardcoding IO in driver won't be viable, > isn't it?), or in smmu DT we would need to set the IO? Unfortunately, I meant exactly that: hardcoding addresses of the SMMU register spaces. see drivers/gpu/drm/msm/dsi_cfg.c Then during device probe the driver can match the IO address to the list of the per-platform ACTLR tables and select the correct one. Then you don't even need a special actlrcfg_gfx. The GFX will fall into the main schema. > > > Thanks & regards, > Bibek > > >> + > >> return 0; > >> } > >> > >> @@ -274,6 +312,13 @@ static const struct of_device_id qcom_smmu_client_of_match[] __maybe_unused = { > >> static int qcom_smmu_init_context(struct arm_smmu_domain *smmu_domain, > >> struct io_pgtable_cfg *pgtbl_cfg, struct device *dev) > >> { > >> + struct arm_smmu_device *smmu = smmu_domain->smmu; > >> + struct qcom_smmu *qsmmu = to_qcom_smmu(smmu); > >> + int cbndx = smmu_domain->cfg.cbndx; > >> + > >> + if (qsmmu->data->actlrcfg) > >> + qcom_smmu_set_actlr(dev, smmu, cbndx, qsmmu->data->actlrcfg); > >> + > > > > One issue occured to me, while I was reviewing the patchset. The ACTLR > > settings are related to the whole SMMU setup, but we are applying them > > each time there is an SMMU context init (in other words, one per each > > domain). Is that correct? Or it's just that there is no better place > > for initialising the global register set? Would it be better to > > reprogram the ACTLR registers which are related just to this > > particular domain? > > > >> smmu_domain->cfg.flush_walk_prefer_tlbiasid = true; > >> > >> return 0; > >> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.h b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.h > >> index f3b91963e234..cb4cb402c202 100644 > >> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.h > >> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.h > >> @@ -1,6 +1,6 @@ > >> /* SPDX-License-Identifier: GPL-2.0-only */ > >> /* > >> - * Copyright (c) 2022, Qualcomm Innovation Center, Inc. All rights reserved. > >> + * Copyright (c) 2023, Qualcomm Innovation Center, Inc. All rights reserved. > >> */ > >> > >> #ifndef _ARM_SMMU_QCOM_H > >> @@ -24,7 +24,11 @@ struct qcom_smmu_config { > >> const u32 *reg_offset; > >> }; > >> > >> +struct actlr_config; > >> + > >> struct qcom_smmu_match_data { > >> + const struct actlr_config *actlrcfg; > >> + const struct actlr_config *actlrcfg_gfx; > >> const struct qcom_smmu_config *cfg; > >> const struct arm_smmu_impl *impl; > >> const struct arm_smmu_impl *adreno_impl; > >> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c > >> index d6d1a2a55cc0..0c7f700b27dd 100644 > >> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c > >> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c > >> @@ -990,9 +990,10 @@ static int arm_smmu_find_sme(struct arm_smmu_device *smmu, u16 id, u16 mask) > >> * expect simply identical entries for this case, but there's > >> * no harm in accommodating the generalisation. > >> */ > >> - if ((mask & smrs[i].mask) == mask && > >> - !((id ^ smrs[i].id) & ~smrs[i].mask)) > >> + > >> + if (smr_is_subset(&smrs[i], id, mask)) > >> return i; > >> + > >> /* > >> * If the new entry has any other overlap with an existing one, > >> * though, then there always exists at least one stream ID > >> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h > >> index 703fd5817ec1..2e4f65412c6b 100644 > >> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.h > >> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h > >> @@ -501,6 +501,11 @@ static inline void arm_smmu_writeq(struct arm_smmu_device *smmu, int page, > >> writeq_relaxed(val, arm_smmu_page(smmu, page) + offset); > >> } > >> > >> +static inline bool smr_is_subset(struct arm_smmu_smr *smrs, u16 id, u16 mask) > >> +{ > >> + return (mask & smrs->mask) == mask && !((id ^ smrs->id) & ~smrs->mask); > >> +} > >> + > >> #define ARM_SMMU_GR0 0 > >> #define ARM_SMMU_GR1 1 > >> #define ARM_SMMU_CB(s, n) ((s)->numpage + (n)) > >> -- > >> 2.17.1 > >> > > > > -- With best wishes Dmitry