Hi Jason, thanks for the quick reply!
On 2/3/25 5:06 PM, Jason Gunthorpe wrote:
On Mon, Feb 03, 2025 at 02:55:12PM +0000, Diogo Ivo wrote:
Hello,
Commit c8cc2655cc6c introduced a regression when trying to use the media
accelerators present on the Tegra X1 SoC.
I came across this regression when testing the branch [1] that leverages
the NVJPG engine in the Tegra X1 for decoding a JPEG file. After commit
c8cc2655cc6c we see the following error messages after submitting a job
through the TEGRA_CHANNEL_SUBMIT IOCTL:
[ 46.879757] tegra-nvjpg 54380000.nvjpg: invalid gather for push buffer
0x0000000108f08000
What driver is this? The message comes from
drivers/gpu/host1x/hw/channel_hw.c
But what driver is 'tegra-nvjpg' that is bound to 54380000.nvjpg ?
Is it the stuff in
drivers/gpu/drm/nouveau/nvkm/engine/nvjpg/
I don't see "tegra-nvjpg" in the kernel?
The driver for NVJPG is not upstreamed yet, I am using a driver that I
wrote that is pretty much a copy of the driver for NVDEC. I have
attached it to this e-mail.
Can you share where the failing command was sent to the device?
The command submission happens in tegra_task_submit() found in [1].
Please let me know if you need more information on my side and I'll be
happy to provide it.
It is still ARM64 & CONFIG_ARM_DMA_USE_IOMMU=n?
Yes it is.
I'm guessing it is the same basic issue as fae6e669cdc5 ("drm/tegra:
Do not assume that a NULL domain means no DMA IOMMU"), except in the
host1x not DRM code. It looks to me like the same pattern was copied
there.
How about this:
diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
index be2ad7203d7b96..090b1fc97a7309 100644
--- a/drivers/gpu/host1x/dev.c
+++ b/drivers/gpu/host1x/dev.c
@@ -361,6 +361,10 @@ static bool host1x_wants_iommu(struct host1x *host1x)
return true;
}
+/*
+ * Returns ERR_PTR on failure, NULL if the translation is IDENTITY, otherwise a
+ * valid paging domain.
+ */
static struct iommu_domain *host1x_iommu_attach(struct host1x *host)
{
struct iommu_domain *domain = iommu_get_domain_for_dev(host->dev);
@@ -385,6 +389,8 @@ static struct iommu_domain *host1x_iommu_attach(struct host1x *host)
* Similarly, if host1x is already attached to an IOMMU (via the DMA
* API), don't try to attach again.
*/
+ if (domain && domain->type == IOMMU_DOMAIN_IDENTITY)
+ domain = NULL;
if (!host1x_wants_iommu(host) || domain)
return domain;
(if not can you investigate this function's flow compared to a good
kernel?)
Yes, this worked! Does this mean that with this change we go through the
path of using the shared Tegra domain (for example in the driver I
attached client->group == true), and if that is the case would it be
beneficial for us to try and change tegra_smmu_def_domain_type() from
returning IOMMU_DOMAIN_IDENTITY into IOMMU_DOMAIN_DMA so that the
dma_alloc_* functions are called directly?
Thank you for your time!
Best regards,
Diogo
[1]:
https://gitlab.freedesktop.org/d.ivo/mesa/-/blob/diogo/vaapi_remove_gpu/src/gallium/drivers/tegra/tegra_task.cFrom 6aea2ca071bb39c4bd5fd7b730e9743aeef3ead5 Mon Sep 17 00:00:00 2001
From: Diogo Ivo <diogo.ivo@xxxxxxxxxxxxxxxxxx>
Date: Thu, 16 Nov 2023 17:29:23 +0000
Subject: [PATCH 1/3] drm/tegra: Add NVJPG driver
Add support for booting and using NVJPG on Tegra210 to the Host1x
and TegraDRM drivers. This driver only supports the new TegraDRM uAPI.
Signed-off-by: Diogo Ivo <diogo.ivo@xxxxxxxxxxxxxxxxxx>
---
drivers/gpu/drm/tegra/Makefile | 1 +
drivers/gpu/drm/tegra/drm.c | 2 +
drivers/gpu/drm/tegra/drm.h | 1 +
drivers/gpu/drm/tegra/nvjpg.c | 331 +++++++++++++++++++++++++++++++++
include/linux/host1x.h | 1 +
5 files changed, 336 insertions(+)
create mode 100644 drivers/gpu/drm/tegra/nvjpg.c
diff --git a/drivers/gpu/drm/tegra/Makefile b/drivers/gpu/drm/tegra/Makefile
index 6fc4b504e786..e399b40d64a1 100644
--- a/drivers/gpu/drm/tegra/Makefile
+++ b/drivers/gpu/drm/tegra/Makefile
@@ -25,6 +25,7 @@ tegra-drm-y := \
falcon.o \
vic.o \
nvdec.o \
+ nvjpg.o \
riscv.o
tegra-drm-y += trace.o
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 35ff303c6674..1252ad834b9b 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -1359,6 +1359,7 @@ static const struct of_device_id host1x_drm_subdevs[] = {
{ .compatible = "nvidia,tegra210-sor1", },
{ .compatible = "nvidia,tegra210-vic", },
{ .compatible = "nvidia,tegra210-nvdec", },
+ { .compatible = "nvidia,tegra210-nvjpg", },
{ .compatible = "nvidia,tegra186-display", },
{ .compatible = "nvidia,tegra186-dc", },
{ .compatible = "nvidia,tegra186-sor", },
@@ -1396,6 +1397,7 @@ static struct platform_driver * const drivers[] = {
&tegra_gr3d_driver,
&tegra_vic_driver,
&tegra_nvdec_driver,
+ &tegra_nvjpg_driver,
};
static int __init host1x_drm_init(void)
diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
index f9d18e8cf6ab..c210b0423f4c 100644
--- a/drivers/gpu/drm/tegra/drm.h
+++ b/drivers/gpu/drm/tegra/drm.h
@@ -209,5 +209,6 @@ extern struct platform_driver tegra_gr2d_driver;
extern struct platform_driver tegra_gr3d_driver;
extern struct platform_driver tegra_vic_driver;
extern struct platform_driver tegra_nvdec_driver;
+extern struct platform_driver tegra_nvjpg_driver;
#endif /* HOST1X_DRM_H */
diff --git a/drivers/gpu/drm/tegra/nvjpg.c b/drivers/gpu/drm/tegra/nvjpg.c
new file mode 100644
index 000000000000..8eae654bac78
--- /dev/null
+++ b/drivers/gpu/drm/tegra/nvjpg.c
@@ -0,0 +1,331 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/clk.h>
+#include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/host1x.h>
+#include <linux/iommu.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
+
+#include "drm.h"
+#include "falcon.h"
+
+struct nvjpg_config {
+ const char *firmware;
+ unsigned int version;
+};
+
+struct nvjpg {
+ struct falcon falcon;
+
+ void __iomem *regs;
+ struct tegra_drm_client client;
+ struct device *dev;
+ struct clk *clk;
+
+ /* Platform configuration */
+ const struct nvjpg_config *config;
+};
+
+static inline struct nvjpg *to_nvjpg(struct tegra_drm_client *client)
+{
+ return container_of(client, struct nvjpg, client);
+}
+
+static inline void nvjpg_writel(struct nvjpg *nvjpg, u32 value,
+ unsigned int offset)
+{
+ writel(value, nvjpg->regs + offset);
+}
+
+static int nvjpg_init(struct host1x_client *client)
+{
+ struct tegra_drm_client *drm = host1x_to_drm_client(client);
+ struct drm_device *dev = dev_get_drvdata(client->host);
+ struct tegra_drm *tegra = dev->dev_private;
+ struct nvjpg *nvjpg = to_nvjpg(drm);
+ int err;
+
+ err = host1x_client_iommu_attach(client);
+ if (err < 0 && err != -ENODEV) {
+ dev_err(nvjpg->dev, "failed to attach to domain: %d\n", err);
+ return err;
+ }
+
+ err = tegra_drm_register_client(tegra, drm);
+ if (err < 0)
+ goto detach;
+
+ /*
+ * Inherit the DMA parameters (such as maximum segment size) from the
+ * parent host1x device.
+ */
+ client->dev->dma_parms = client->host->dma_parms;
+
+ return 0;
+
+detach:
+ host1x_client_iommu_detach(client);
+
+ return err;
+}
+
+static int nvjpg_exit(struct host1x_client *client)
+{
+ struct tegra_drm_client *drm = host1x_to_drm_client(client);
+ struct drm_device *dev = dev_get_drvdata(client->host);
+ struct tegra_drm *tegra = dev->dev_private;
+ struct nvjpg *nvjpg = to_nvjpg(drm);
+ int err;
+
+ /* avoid a dangling pointer just in case this disappears */
+ client->dev->dma_parms = NULL;
+
+ err = tegra_drm_unregister_client(tegra, drm);
+ if (err < 0)
+ return err;
+
+ pm_runtime_dont_use_autosuspend(client->dev);
+ pm_runtime_force_suspend(client->dev);
+
+ host1x_client_iommu_detach(client);
+
+ if (client->group) {
+ dma_unmap_single(nvjpg->dev, nvjpg->falcon.firmware.phys,
+ nvjpg->falcon.firmware.size, DMA_TO_DEVICE);
+ tegra_drm_free(tegra, nvjpg->falcon.firmware.size,
+ nvjpg->falcon.firmware.virt,
+ nvjpg->falcon.firmware.iova);
+ } else {
+ dma_free_coherent(nvjpg->dev, nvjpg->falcon.firmware.size,
+ nvjpg->falcon.firmware.virt,
+ nvjpg->falcon.firmware.iova);
+ }
+
+ return 0;
+}
+
+static const struct host1x_client_ops nvjpg_client_ops = {
+ .init = nvjpg_init,
+ .exit = nvjpg_exit,
+};
+
+static int nvjpg_load_falcon_firmware(struct nvjpg *nvjpg)
+{
+ struct host1x_client *client = &nvjpg->client.base;
+ struct tegra_drm *tegra = nvjpg->client.drm;
+ dma_addr_t iova;
+ size_t size;
+ void *virt;
+ int err;
+
+ if (nvjpg->falcon.firmware.virt)
+ return 0;
+
+ err = falcon_read_firmware(&nvjpg->falcon, nvjpg->config->firmware);
+ if (err < 0)
+ return err;
+
+ size = nvjpg->falcon.firmware.size;
+
+ if (!client->group) {
+ virt = dma_alloc_coherent(nvjpg->dev, size, &iova, GFP_KERNEL);
+
+ err = dma_mapping_error(nvjpg->dev, iova);
+ if (err < 0)
+ return err;
+ } else {
+ virt = tegra_drm_alloc(tegra, size, &iova);
+ if (IS_ERR(virt))
+ return PTR_ERR(virt);
+ }
+
+ nvjpg->falcon.firmware.virt = virt;
+ nvjpg->falcon.firmware.iova = iova;
+
+ err = falcon_load_firmware(&nvjpg->falcon);
+ if (err < 0)
+ goto cleanup;
+
+ /*
+ * In this case we have received an IOVA from the shared domain, so we
+ * need to make sure to get the physical address so that the DMA API
+ * knows what memory pages to flush the cache for.
+ */
+ if (client->group) {
+ dma_addr_t phys;
+
+ phys = dma_map_single(nvjpg->dev, virt, size, DMA_TO_DEVICE);
+
+ err = dma_mapping_error(nvjpg->dev, phys);
+ if (err < 0)
+ goto cleanup;
+
+ nvjpg->falcon.firmware.phys = phys;
+ }
+
+ return 0;
+
+cleanup:
+ if (!client->group)
+ dma_free_coherent(nvjpg->dev, size, virt, iova);
+ else
+ tegra_drm_free(tegra, size, virt, iova);
+
+ return err;
+}
+
+static __maybe_unused int nvjpg_runtime_resume(struct device *dev)
+{
+ struct nvjpg *nvjpg = dev_get_drvdata(dev);
+ int err;
+
+ err = clk_prepare_enable(nvjpg->clk);
+ if (err < 0)
+ return err;
+
+ err = nvjpg_load_falcon_firmware(nvjpg);
+ if (err < 0)
+ goto disable;
+
+ err = falcon_boot(&nvjpg->falcon);
+ if (err < 0)
+ goto disable;
+
+ return 0;
+
+disable:
+ clk_disable_unprepare(nvjpg->clk);
+ return err;
+}
+
+static __maybe_unused int nvjpg_runtime_suspend(struct device *dev)
+{
+ struct nvjpg *nvjpg = dev_get_drvdata(dev);
+
+ clk_disable_unprepare(nvjpg->clk);
+
+ return 0;
+}
+
+static int nvjpg_can_use_memory_ctx(struct tegra_drm_client *client, bool *supported)
+{
+ *supported = false;
+
+ return 0;
+}
+
+static const struct tegra_drm_client_ops nvjpg_ops = {
+ .get_streamid_offset = NULL,
+ .can_use_memory_ctx = nvjpg_can_use_memory_ctx,
+};
+#define NVIDIA_TEGRA_210_NVJPG_FIRMWARE "nvidia/tegra210/nvjpg.bin"
+
+static const struct nvjpg_config nvjpg_t210_config = {
+ .firmware = NVIDIA_TEGRA_210_NVJPG_FIRMWARE,
+ .version = 0x21,
+};
+
+static const struct of_device_id tegra_nvjpg_of_match[] = {
+ { .compatible = "nvidia,tegra210-nvjpg", .data = &nvjpg_t210_config },
+ { },
+};
+MODULE_DEVICE_TABLE(of, tegra_nvjpg_of_match);
+
+static int nvjpg_probe(struct platform_device *pdev)
+{
+ struct device *dev = &pdev->dev;
+ struct nvjpg *nvjpg;
+ int err;
+
+ /* inherit DMA mask from host1x parent */
+ err = dma_coerce_mask_and_coherent(dev, *dev->parent->dma_mask);
+ if (err < 0) {
+ dev_err(&pdev->dev, "failed to set DMA mask: %d\n", err);
+ return err;
+ }
+
+ nvjpg = devm_kzalloc(dev, sizeof(*nvjpg), GFP_KERNEL);
+ if (!nvjpg)
+ return -ENOMEM;
+
+ nvjpg->config = of_device_get_match_data(dev);
+
+ nvjpg->regs = devm_platform_get_and_ioremap_resource(pdev, 0, NULL);
+ if (IS_ERR(nvjpg->regs))
+ return PTR_ERR(nvjpg->regs);
+
+ nvjpg->clk = devm_clk_get(dev, "nvjpg");
+ if (IS_ERR(nvjpg->clk)) {
+ dev_err(&pdev->dev, "failed to get clock\n");
+ return PTR_ERR(nvjpg->clk);
+ }
+
+ nvjpg->falcon.dev = dev;
+ nvjpg->falcon.regs = nvjpg->regs;
+
+ err = falcon_init(&nvjpg->falcon);
+ if (err < 0)
+ return err;
+
+ platform_set_drvdata(pdev, nvjpg);
+
+ INIT_LIST_HEAD(&nvjpg->client.base.list);
+ nvjpg->client.base.ops = &nvjpg_client_ops;
+ nvjpg->client.base.dev = dev;
+ nvjpg->client.base.class = HOST1X_CLASS_NVJPG;
+ nvjpg->dev = dev;
+
+ INIT_LIST_HEAD(&nvjpg->client.list);
+ nvjpg->client.version = nvjpg->config->version;
+ nvjpg->client.ops = &nvjpg_ops;
+
+ err = host1x_client_register(&nvjpg->client.base);
+ if (err < 0) {
+ dev_err(dev, "failed to register host1x client: %d\n", err);
+ goto exit_falcon;
+ }
+
+ pm_runtime_enable(dev);
+ pm_runtime_use_autosuspend(dev);
+ pm_runtime_set_autosuspend_delay(dev, 500);
+
+ return 0;
+
+exit_falcon:
+ falcon_exit(&nvjpg->falcon);
+
+ return err;
+}
+
+static void nvjpg_remove(struct platform_device *pdev)
+{
+ struct nvjpg *nvjpg = platform_get_drvdata(pdev);
+
+ pm_runtime_disable(&pdev->dev);
+ host1x_client_unregister(&nvjpg->client.base);
+ falcon_exit(&nvjpg->falcon);
+}
+
+static const struct dev_pm_ops nvjpg_pm_ops = {
+ SET_RUNTIME_PM_OPS(nvjpg_runtime_suspend, nvjpg_runtime_resume, NULL)
+ SET_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,
+ pm_runtime_force_resume)
+};
+
+struct platform_driver tegra_nvjpg_driver = {
+ .driver = {
+ .name = "tegra-nvjpg",
+ .of_match_table = tegra_nvjpg_of_match,
+ .pm = &nvjpg_pm_ops
+ },
+ .probe = nvjpg_probe,
+ .remove_new = nvjpg_remove,
+};
+
+#if IS_ENABLED(CONFIG_ARCH_TEGRA_210_SOC)
+MODULE_FIRMWARE(NVIDIA_TEGRA_210_NVJPG_FIRMWARE);
+#endif
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index 9c8119ed13a4..922867359b0e 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -18,6 +18,7 @@ enum host1x_class {
HOST1X_CLASS_GR2D_SB = 0x52,
HOST1X_CLASS_VIC = 0x5D,
HOST1X_CLASS_GR3D = 0x60,
+ HOST1X_CLASS_NVJPG = 0xC0,
HOST1X_CLASS_NVDEC = 0xF0,
HOST1X_CLASS_NVDEC1 = 0xF5,
};
--
2.48.1
From b5c50cdf63f24d4979c5ea88481610575b503b61 Mon Sep 17 00:00:00 2001
From: Diogo Ivo <diogo.ivo@xxxxxxxxxxxxxxxxxx>
Date: Thu, 16 Nov 2023 17:29:24 +0000
Subject: [PATCH 2/3] arm64: tegra: Add NVJPG power-domain node
Add the NVJPG power-domain node in order to support the NVJPG
accelerator.
Signed-off-by: Diogo Ivo <diogo.ivo@xxxxxxxxxxxxxxxxxx>
---
arch/arm64/boot/dts/nvidia/tegra210.dtsi | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/arch/arm64/boot/dts/nvidia/tegra210.dtsi b/arch/arm64/boot/dts/nvidia/tegra210.dtsi
index 842669dac094..d96651d09d90 100644
--- a/arch/arm64/boot/dts/nvidia/tegra210.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra210.dtsi
@@ -939,6 +939,12 @@ pd_xusbhost: xusbc {
resets = <&tegra_car TEGRA210_CLK_XUSB_HOST>;
#power-domain-cells = <0>;
};
+
+ pd_nvjpg: nvjpg {
+ clocks = <&tegra_car TEGRA210_CLK_NVJPG>;
+ resets = <&tegra_car 195>;
+ #power-domain-cells = <0>;
+ };
};
};
--
2.48.1
From d63e3a36f0c0e570e1bfa4467efe54da164a062d Mon Sep 17 00:00:00 2001
From: Diogo Ivo <diogo.ivo@xxxxxxxxxxxxxxxxxx>
Date: Thu, 16 Nov 2023 17:29:25 +0000
Subject: [PATCH 3/3] arm64: tegra: Add NVJPG node
The Tegra X1 chip contains a NVJPG accelerator capable of
encoding/decoding JPEG files in hardware, so add its DT node.
Signed-off-by: Diogo Ivo <diogo.ivo@xxxxxxxxxxxxxxxxxx>
---
arch/arm64/boot/dts/nvidia/tegra210.dtsi | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/boot/dts/nvidia/tegra210.dtsi b/arch/arm64/boot/dts/nvidia/tegra210.dtsi
index d96651d09d90..f957eefeae9f 100644
--- a/arch/arm64/boot/dts/nvidia/tegra210.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra210.dtsi
@@ -253,7 +253,13 @@ vic@54340000 {
nvjpg@54380000 {
compatible = "nvidia,tegra210-nvjpg";
reg = <0x0 0x54380000 0x0 0x00040000>;
- status = "disabled";
+ clocks = <&tegra_car TEGRA210_CLK_NVJPG>;
+ clock-names = "nvjpg";
+ resets = <&tegra_car 195>;
+ reset-names = "nvjpg";
+
+ iommus = <&mc TEGRA_SWGROUP_NVJPG>;
+ power-domains = <&pd_nvjpg>;
};
dsib: dsi@54400000 {
--
2.48.1