On Wed, Dec 11, 2024 at 5:30 AM Bibek Kumar Patro <quic_bibekkum@xxxxxxxxxxx> wrote: > > > > On 12/6/2024 8:48 PM, Rob Clark wrote: > > On Fri, Dec 6, 2024 at 4:36 AM Bibek Kumar Patro > > <quic_bibekkum@xxxxxxxxxxx> wrote: > >> > >> > >> > >> On 12/4/2024 8:54 PM, Rob Clark wrote: > >>> On Wed, Dec 4, 2024 at 3:28 AM Bibek Kumar Patro > >>> <quic_bibekkum@xxxxxxxxxxx> wrote: > >>>> > >>>> > >>>> > >>>> On 11/22/2024 10:04 PM, Rob Clark wrote: > >>>>> On Fri, Nov 22, 2024 at 8:20 AM Bibek Kumar Patro > >>>>> <quic_bibekkum@xxxxxxxxxxx> wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> On 11/21/2024 3:40 AM, Rob Clark wrote: > >>>>>>> On Wed, Nov 20, 2024 at 9:17 AM Rob Clark <robdclark@xxxxxxxxx> wrote: > >>>>>>>> > >>>>>>>> On Thu, Nov 14, 2024 at 8:10 AM Bibek Kumar Patro > >>>>>>>> <quic_bibekkum@xxxxxxxxxxx> wrote: > >>>>>>>>> > >>>>>>>>> Add an adreno-smmu-priv interface for drm/msm to call > >>>>>>>>> into arm-smmu-qcom and initiate the PRR bit setup or reset > >>>>>>>>> sequence as per request. > >>>>>>>>> > >>>>>>>>> This will be used by GPU to setup the PRR bit and related > >>>>>>>>> configuration registers through adreno-smmu private > >>>>>>>>> interface instead of directly poking the smmu hardware. > >>>>>>>>> > >>>>>>>>> Suggested-by: Rob Clark <robdclark@xxxxxxxxx> > >>>>>>>>> Signed-off-by: Bibek Kumar Patro <quic_bibekkum@xxxxxxxxxxx> > >>>>>>>>> --- > >>>>>>>>> drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 37 ++++++++++++++++++++++ > >>>>>>>>> drivers/iommu/arm/arm-smmu/arm-smmu.h | 2 ++ > >>>>>>>>> include/linux/adreno-smmu-priv.h | 14 ++++++++ > >>>>>>>>> 3 files changed, 53 insertions(+) > >>>>>>>>> > >>>>>>>>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c > >>>>>>>>> index d26f5aea248e..0e4f3fbda961 100644 > >>>>>>>>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c > >>>>>>>>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c > >>>>>>>>> @@ -16,6 +16,8 @@ > >>>>>>>>> > >>>>>>>>> #define QCOM_DUMMY_VAL -1 > >>>>>>>>> > >>>>>>>>> +#define GFX_ACTLR_PRR (1 << 5) > >>>>>>>>> + > >>>>>>>>> static struct qcom_smmu *to_qcom_smmu(struct arm_smmu_device *smmu) > >>>>>>>>> { > >>>>>>>>> return container_of(smmu, struct qcom_smmu, smmu); > >>>>>>>>> @@ -99,6 +101,32 @@ static void qcom_adreno_smmu_resume_translation(const void *cookie, bool termina > >>>>>>>>> arm_smmu_cb_write(smmu, cfg->cbndx, ARM_SMMU_CB_RESUME, reg); > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> +static void qcom_adreno_smmu_set_prr_bit(const void *cookie, bool set) > >>>>>>>>> +{ > >>>>>>>>> + struct arm_smmu_domain *smmu_domain = (void *)cookie; > >>>>>>>>> + struct arm_smmu_device *smmu = smmu_domain->smmu; > >>>>>>>>> + struct arm_smmu_cfg *cfg = &smmu_domain->cfg; > >>>>>>>>> + u32 reg = 0; > >>>>>>>>> + > >>>>>>>>> + reg = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_ACTLR); > >>>>>>>>> + reg &= ~GFX_ACTLR_PRR; > >>>>>>>>> + if (set) > >>>>>>>>> + reg |= FIELD_PREP(GFX_ACTLR_PRR, 1); > >>>>>>>>> + arm_smmu_cb_write(smmu, cfg->cbndx, ARM_SMMU_CB_ACTLR, reg); > >>>>>>>>> +} > >>>>>>>>> + > >>>>>>>>> +static void qcom_adreno_smmu_set_prr_addr(const void *cookie, phys_addr_t page_addr) > >>>>>>>>> +{ > >>>>>>>>> + struct arm_smmu_domain *smmu_domain = (void *)cookie; > >>>>>>>>> + struct arm_smmu_device *smmu = smmu_domain->smmu; > >>>>>>>>> + > >>>>>>>>> + writel_relaxed(lower_32_bits(page_addr), > >>>>>>>>> + smmu->base + ARM_SMMU_GFX_PRR_CFG_LADDR); > >>>>>>>>> + > >>>>>>>>> + writel_relaxed(upper_32_bits(page_addr), > >>>>>>>>> + smmu->base + ARM_SMMU_GFX_PRR_CFG_UADDR); > >>>>>>>>> +} > >>>>>>>>> + > >>>>>>>>> #define QCOM_ADRENO_SMMU_GPU_SID 0 > >>>>>>>>> > >>>>>>>>> static bool qcom_adreno_smmu_is_gpu_device(struct device *dev) > >>>>>>>>> @@ -210,6 +238,7 @@ static bool qcom_adreno_can_do_ttbr1(struct arm_smmu_device *smmu) > >>>>>>>>> static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain, > >>>>>>>>> struct io_pgtable_cfg *pgtbl_cfg, struct device *dev) > >>>>>>>>> { > >>>>>>>>> + const struct device_node *np = smmu_domain->smmu->dev->of_node; > >>>>>>>>> struct adreno_smmu_priv *priv; > >>>>>>>>> > >>>>>>>>> smmu_domain->cfg.flush_walk_prefer_tlbiasid = true; > >>>>>>>>> @@ -239,6 +268,14 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain, > >>>>>>>>> priv->get_fault_info = qcom_adreno_smmu_get_fault_info; > >>>>>>>>> priv->set_stall = qcom_adreno_smmu_set_stall; > >>>>>>>>> priv->resume_translation = qcom_adreno_smmu_resume_translation; > >>>>>>>>> + priv->set_prr_bit = NULL; > >>>>>>>>> + priv->set_prr_addr = NULL; > >>>>>>>>> + > >>>>>>>>> + if (of_device_is_compatible(np, "qcom,smmu-500") && > >>>>>>>>> + of_device_is_compatible(np, "qcom,adreno-smmu")) { > >>>>>>>> > >>>>>>>> fwiw, it seems like PRR actually works on sc7180, which is _not_ > >>>>>>>> mmu-500, so I guess the support actually goes back further. > >>>>>>>> > >>>>>>>> I'm curious if we can just rely on this being supported by any hw that > >>>>>>>> has a6xx or newer? > >>>>>>> > >>>>>>> > >>>>>>> Also, unrelated, but we can't assume the smmu is powered when drm > >>>>>>> driver calls set_prr_bit/addr, could you add in runpm get/put around > >>>>>>> the register access? > >>>>>>> > >>>>>> > >>>>>> I see, thanks for this observation. > >>>>>> It's surely a possible case, when they access these registers > >>>>>> SMMU state is off. > >>>>>> I will add the suggested runpm ops around the register access. > >>>>>> > >>>>>>> Otherwise Conner and I have vk sparse residency working now > >>>>>>> > >>>>>> > >>>>>> Sorry, could not get this. Did you mean that vk sparse residency > >>>>>> is working now using this patch? > >>>>> > >>>>> Yes, vk sparse residency is working with this patch (plus addition of > >>>>> runpm get/put, plus drm/msm patches plus turnip patches) ;-) > >>>>> > >>>> > >>>> Thanks for testing the sparse residency feature with our patch Rob, > >>>> I have an additional query regarding the adreno private interfaces. > >>>> Specifically, I was referring to other interfaces such as > >>>> qcom_adreno_smmu_get_fault_info [1]. It appears that we do not have a > >>>> runpm get/put around the register access in this context. > >>> > >>> get_fault_info() is called from the iommu fault handler callback, so > >>> from the fault irq handler, which is why it didn't need the runpm > >>> get/put. Maybe it is bad to make this assumption about usage, but > >>> then again adreno_smmu_priv isn't really a general purpose interface. > >>> > >> > >> Ah okay, got it. > >> I was just going through all the adreno_smmmu_priv interfaces just > >> to get a better understanding of it's interaction with smmu and it seems > >> like apart from PRR and get_fault_info(), set_ttbr0_cfg(), > >> resume_translation() is also having reg access but not voting. > >> Should we put runpm_put/get here as well or these can be excluded. > >> <Just curious about the thought process behind this, is it because of > >> some sequence when to put a vote, like reg access in middle of smmu > >> power cycle and not in beginning or end.> > > > > I think we just haven't needed it, or noticed that we needed it, > > outside of setting up prr. > > > > As I mentioned, get_fault_info() is called from the fault irq, so we > > know the smmu is already powered. > > > > okay got it, that makes sense. > > > As far as set_ttbr0_cfg(), it probably works just because > > arm_smmu_write_context_bank() ends up getting called again in the > > resume path, so if the smmu is suspended when set_ttbr0_cfg() is > > called, the writes just get ignored. But the updated cfg is > > re-applied to the hw when it is resumed. Probably the same situation > > with resume_translation().. ie. if the smmu is suspended there are no > > translations to resume. > > > > Maybe it would be more correct in set_ttbr0_cfg() and > > resume_translation() to do a pm_runtime_get_if_in_use() and skip the > > hw writes if the smmu is suspended. > > > >> > >>>> Could you please clarify whether we need an SMMU vote around register > >>>> access in the case of PRR? IMO, should the users of this callback ensure > >>>> they put a vote before accessing the cb? > >>> > >>> How can drm vote for the smmu device? I guess it could power up > >>> itself and rely on device-link.. but that is pretty overkill to power > >>> up the entire gpu in this case. I think it is best for the vote to > >>> happen in the PRR callbacks. > >>> > >> > >> Okay I see, GPU can only power itself up through <gpu_state_get I > >> assume> but won't have exclusive access to power on the smmu only. > >> > >> Incase we go forward to put vote in PRR callback for SMMU, I was > >> planning that we can refactor existing arm_smmu_rpm_put/get() from > >> arm_smmu.c to it's header file so that the same can be used in > >> arm_smmu_qcom.c ? What's your thoughts on this? > > > > I had briefly thought of doing the same. But the main reason for > > those helpers is common arm-smmu code that is used on non-qcom > > platforms where runpm is not enabled. In arm-smmu-qcom.c, we know > > that runpm is enabled, so we could just use return > > pm_runtime_resume_and_get()/pm_runtime_put_autosuspend() directly. > > > > Ohkay I see, we then do not need pm_runtime_enabled() check for qcom > platforms before putting the vote. > I am currently modifying this patch only to directly add > pm_runtime_resume_and_get()/pm_runtime_put_autosuspend() > around the register access of PRR related adreno private interfaces. > I will send this updated patch as part of v18 shortly. > > Additionally, we can evaluate the use of pm_runtime operations for > set_ttbr0_cfg() and resume_translation() in a separate series ? Yup, sounds good BR, -R