Re: [PATCH v6 5/6] iommu/arm-smmu-v3: Add in-kernel support for NVIDIA Tegra241 (Grace) CMDQV

Nicolin Chen <nicolinc@xxxxxxxxxx> · Tue, 30 Apr 2024 11:08:55 -0700

(Removing chunks that I simply ack)

On Tue, Apr 30, 2024 at 01:35:45PM -0300, Jason Gunthorpe wrote:
> On Mon, Apr 29, 2024 at 09:43:48PM -0700, Nicolin Chen wrote:

> > +/* MMIO helpers */
> > +#define cmdqv_readl(reg) \
> > +	readl(cmdqv->base + TEGRA241_CMDQV_##reg)
> > +#define cmdqv_readl_relaxed(reg) \
> > +	readl_relaxed(cmdqv->base + TEGRA241_CMDQV_##reg)
> > +#define cmdqv_writel(val, reg) \
> > +	writel((val), cmdqv->base + TEGRA241_CMDQV_##reg)
> > +#define cmdqv_writel_relaxed(val, reg) \
> > +	writel_relaxed((val), cmdqv->base + TEGRA241_CMDQV_##reg)
> 
> Please don't hide access to a stack variable in a macro, and I'm not
> keen on the ##reg scheme either - it makes it much harder to search
> for things.

I can pass in cmdqv/vintf/vcmdq pointers, if it would be better.

> Really this all seems like alot of overkill to make a little bit of
> shorthand. It is not so wordy just to type it out:
> 
>   readl(vintf->base + TEGRA241_VINTF_CONFIG) 

vintf_readl(vintf, CONFIG) is much shorter. Doing so reduced the
line breaks at quite a lot places, so overall the driver looks a
lot cleaner to me. It also helps a bit, when I want to debug the
HW configuration flow by adding prints to these helpers. It might
be a personal preference, yet I would still like to have these.

> > +/* Logging helpers */
> > +#define cmdqv_warn(fmt, ...) \
> > +	dev_warn(cmdqv->dev, "CMDQV: " fmt, ##__VA_ARGS__)
> > +#define cmdqv_err(fmt, ...) \
> > +	dev_err(cmdqv->dev, "CMDQV: " fmt, ##__VA_ARGS__)
> > +#define cmdqv_info(fmt, ...) \
> > +	dev_info(cmdqv->dev, "CMDQV: " fmt, ##__VA_ARGS__)
> > +#define cmdqv_dbg(fmt, ...) \
> > +	dev_dbg(cmdqv->dev, "CMDQV: " fmt, ##__VA_ARGS__)
> 
> Really not sure these are necessary, same remark about the stack
> variable.

Same justification. And it simply keeps the same style of prints.
Also, it eases the following vcmdq prints.

I can probably change these logging helpers to inline functions.

> Also cmdqv->dev is the wrong thing to print, this is part of the smmu driver,
> it should print cmdqv->smmu->dev for consistency

Yea. I can drop the dev from the cmdqv structure.

> > +#define vintf_warn(fmt, ...) \
> > +	dev_warn(vintf->cmdqv->dev, "VINTF%u: " fmt, vintf->idx, ##__VA_ARGS__)
> > +#define vintf_err(fmt, ...) \
> > +	dev_err(vintf->cmdqv->dev, "VINTF%u: " fmt, vintf->idx, ##__VA_ARGS__)
> > +#define vintf_info(fmt, ...) \
> > +	dev_info(vintf->cmdqv->dev, "VINTF%u: " fmt, vintf->idx, ##__VA_ARGS__)
> > +#define vintf_dbg(fmt, ...) \
> > +	dev_dbg(vintf->cmdqv->dev, "VINTF%u: " fmt, vintf->idx, ##__VA_ARGS__)
> > +
> > +#define vcmdq_warn(fmt, ...)                                                   \
> > +	({                                                                     \
> > +		struct tegra241_vintf *vintf = vcmdq->vintf;                   \
> > +		if (vintf)                                                     \
> > +			vintf_warn("VCMDQ%u/LVCMDQ%u: " fmt,                   \
> > +				   vcmdq->idx, vcmdq->lidx,                    \
> > +				   ##__VA_ARGS__);                             \
> > +		else                                                           \
> > +			dev_warn(vcmdq->cmdqv->dev, "VCMDQ%u: " fmt,           \
> > +				 vcmdq->idx, ##__VA_ARGS__);                   \
> > +	})

> Some of these are barely used, is it worth all these macros??

Only vcmdq_warn isn't called. But I think it would be useful.
I could also find a place to call it, if that's a must.

> > +
> > +/* Configuring and polling helpers */
> > +#define tegra241_cmdqv_write_config(_owner, _OWNER, _regval)                   \
> > +	({                                                                     \
> > +		bool _en = (_regval) & _OWNER##_EN;                            \
> > +		u32 _status;                                                   \
> > +		int _ret;                                                      \
> > +		writel((_regval), _owner->base + TEGRA241_##_OWNER##_CONFIG);  \
> > +		_ret = readl_poll_timeout(                                     \
> > +			_owner->base + TEGRA241_##_OWNER##_STATUS, _status,    \
> > +			_en ? (_regval) & _OWNER##_ENABLED :                   \
> > +			      !((_regval) & _OWNER##_ENABLED),                 \
> > +			1, ARM_SMMU_POLL_TIMEOUT_US);                          \
> > +		if (_ret)                                                      \
> > +			_owner##_err("failed to %sable, STATUS = 0x%08X\n",    \
> > +				     _en ? "en" : "dis", _status);             \
> > +		atomic_set(&_owner->status, _status);                          \
> > +		_ret;                                                          \
> > +	})
> 
> I feel like this could be an actual inline function without the macro
> wrapper with a little fiddling.

It would be unrolled to three mostly identical inline functions:
	tegra241_cmdqv_write_config(cmdqv, regval)
	tegra241_vintf_write_config(vintf, regval)
	tegra241_vcmdq_write_config(vcmdq, regval)

> > +
> > +#define cmdqv_write_config(_regval) \
> > +	tegra241_cmdqv_write_config(cmdqv, CMDQV, _regval)
> > +#define vintf_write_config(_regval) \
> > +	tegra241_cmdqv_write_config(vintf, VINTF, _regval)
> > +#define vcmdq_write_config(_regval) \
> > +	tegra241_cmdqv_write_config(vcmdq, VCMDQ, _regval)
> 
> More hidden access to stack values

Btw, any reason for forbidding this practice? It will break the
build if something goes wrong, which seems to be pretty easy to
catch.

> > +/**
> > + * struct tegra241_cmdqv - CMDQ-V for SMMUv3
> > + * @smmu: SMMUv3 pointer
> > + * @dev: Device pointer
> 
> This should probably be clarified as the device pointer to the ACPI
> companion device

I could drop it and use cmdqv->smmu->dev as your previous remark
suggested.

> > +struct arm_smmu_cmdq *tegra241_cmdqv_get_cmdq(struct arm_smmu_device *smmu)
> > +{
...
> > +	/* Use SMMU CMDQ if vintfs[0] is uninitialized */
> > +	if (!FIELD_GET(VINTF_ENABLED, atomic_read(&vintf->status)))
> > +		return &smmu->cmdq;
> > +
> > +	/* Use SMMU CMDQ if vintfs[0] has error status */
> > +	if (FIELD_GET(VINTF_STATUS, atomic_read(&vintf->status)))
> > +		return &smmu->cmdq;
> 
> Why atomic_read? The unlocked interaction with
> tegra241_cmdqv_handle_vintf0_error() doesn't seem especially sane IMHO

Race between this get_cmdq() and the isr. Any alternative practice?

> > +static void tegra241_vcmdq_hw_deinit(struct tegra241_vcmdq *vcmdq)
> > +{
> > +	u32 gerrorn, gerror;
> > +
> > +	if (vcmdq_write_config(0)) {
> > +		vcmdq_err("GERRORN=0x%X\n", vcmdq_page0_readl_relaxed(GERRORN));
> > +		vcmdq_err("GERROR=0x%X\n", vcmdq_page0_readl_relaxed(GERROR));
> > +		vcmdq_err("CONS=0x%X\n", vcmdq_page0_readl_relaxed(CONS));
> 
> Less prints, include a unique message about why this is being
> printed..

Something must be wrong if disabling VCMDQ fails, so the prints of
error register values would be helpful. And "failed to disable" is
already printed by the vcmdq_write_config() call. I can merge them
into one vcmdq_err call though.

> > +	}
> > +	vcmdq_page0_writel_relaxed(0, PROD);
> > +	vcmdq_page0_writel_relaxed(0, CONS);
> > +	vcmdq_page1_writeq_relaxed(0, BASE);
> > +	vcmdq_page1_writeq_relaxed(0, CONS_INDX_BASE);
> > +
> > +	gerrorn = vcmdq_page0_readl_relaxed(GERRORN);
> > +	gerror = vcmdq_page0_readl_relaxed(GERROR);
> > +	if (gerror != gerrorn) {
> > +		vcmdq_info("Uncleared error detected, resetting\n");
> > +		vcmdq_page0_writel(gerror, GERRORN);
> > +	}
> > +
> > +	vcmdq_dbg("deinited\n");
> > +}
> > +
> > +static int tegra241_vcmdq_hw_init(struct tegra241_vcmdq *vcmdq)
> > +{
> > +	int ret;
> > +
> > +	/* Configure and enable the vcmdq */
> > +	tegra241_vcmdq_hw_deinit(vcmdq);
> > +
> > +	vcmdq_page1_writeq_relaxed(vcmdq->cmdq.q.q_base, BASE);
> > +
> > +	ret = vcmdq_write_config(VCMDQ_EN);
> > +	if (ret) {
> > +		vcmdq_err("GERRORN=0x%X\n", vcmdq_page0_readl_relaxed(GERRORN));
> > +		vcmdq_err("GERROR=0x%X\n", vcmdq_page0_readl_relaxed(GERROR));
> > +		vcmdq_err("CONS=0x%X\n", vcmdq_page0_readl_relaxed(CONS));
> > +		return ret;
> 
> Same print?

Yea. Here the prints are for a failure when enabling a VCMDQ. Again,
"failed to enable" is already printed by vcmdq_write_config(). I'll
merge three to one call here too.

> > +static void tegra241_vcmdq_free_smmu_cmdq(struct tegra241_vcmdq *vcmdq)
> > +{
> > +	struct tegra241_cmdqv *cmdqv = vcmdq->cmdqv;
> > +	struct arm_smmu_queue *q = &vcmdq->cmdq.q;
> > +	size_t nents = 1 << q->llq.max_n_shift;
> > +
> > +	dmam_free_coherent(cmdqv->smmu->dev, (nents * CMDQ_ENT_DWORDS) << 3,
> > +			   q->base, q->base_dma);
> 
> If we are calling dmam_free, do we really need devm at all?

Hmm. This is a part of SMMU's probe/device_reset(). So, all the
devm calls in cmdqv driver are following the style there, since
the arm-smmu-v3 driver could be rmmod-ed? Though the arm-smmu-v3
driver seems to miss a dmam_free_coherent for its own queues..

> > +tegra241_cmdqv_acpi_probe(struct arm_smmu_device *smmu, int id)
> 
> id is a u32. 

Ack.

> It might be clearer to just pass in the struct
> acpi_iort_node *?

Well, it felt quite similar to me, yet acpi_iort_node probably
fits the name better.

> > +{
> > +	struct tegra241_cmdqv *cmdqv;
> > +
> > +	cmdqv = tegra241_cmdqv_find_resource(smmu, id);
> > +	if (!cmdqv)
> > +		return NULL;
> > +
> > +	if (tegra241_cmdqv_probe(cmdqv)) {
> > +		if (cmdqv->irq > 0)
> > +			devm_free_irq(smmu->dev, cmdqv->irq, cmdqv);
> > +		devm_iounmap(smmu->dev, cmdqv->base);
> > +		devm_kfree(smmu->dev, cmdqv);
> > +		return NULL;
> 
> Oh. Please don't use devm at all in this code then, it is not attached
> to a probed driver with the proper scope, devm isn't going to work in
> sensible way.

Mind elaborating "it is not"? This function is called by
arm_smmu_device_acpi_probe and arm_smmu_device_probe.

Thanks for the help!
Nicolin