Re: [REGRESSION] Failed buffer allocation in Tegra fbdev

Jon Hunter <jonathanh@xxxxxxxxxx> · Thu, 29 Feb 2024 14:50:49 +0000

On 24/01/2024 12:56, Diogo Ivo wrote:

...

I did the tracing and found that the ENOENT is coming from
sysfs_do_create_link_sd() in the following function call chain:

of_iommu_configure() -> iommu_probe_device() -> __iommu_probe_device() ->

What's the call path leading up to that? If it's the one from
host1x_device_add() then it's expected and benign - for fiddly reasons,
iommu_probe_device() ends up being called too early, but will soon be run
again in the correct circumstances once we proceed into
host1x_subdev_register()->device_add(). That will have been happening for
years, we just never reported errors in that spot before (and frankly I'm
not convinced it's valuable to have added it now).

Thanks,
Robin.

Yes, it is the one called from host1x_device_add(), so this
is solved and only the patch sent above needs to be merged.

Sorry for the delay in getting back to this. I have been doing more
testing and the backtrace I see from this warning is ...

[    7.001380]  drm: iommu configuration for device failed with -ENOENT
[    7.001550] CPU: 4 PID: 263 Comm: systemd-udevd Not tainted 6.8.0-rc6-gbbe953beb8b9-dirty #2
[    7.001559] Hardware name: NVIDIA Jetson AGX Xavier Developer Kit (DT)
[    7.001564] Call trace:
[    7.001568]  dump_backtrace.part.6+0x84/0xdc
[    7.001583]  show_stack+0x14/0x1c
[    7.001590]  dump_stack_lvl+0x48/0x5c
[    7.001600]  dump_stack+0x14/0x1c
[    7.001606]  of_dma_configure_id+0x218/0x400
[    7.001636]  host1x_attach_driver+0x150/0x2d0 [host1x]
[    7.001664]  host1x_driver_register_full+0x7c/0xdc [host1x]
[    7.001711]  host1x_drm_init+0x3c/0x1000 [tegra_drm]
[    7.001746]  do_one_initcall+0x58/0x1c0
[    7.001752]  do_init_module+0x54/0x1d8
[    7.001761]  load_module+0x18b8/0x18ec
[    7.001770]  init_module_from_file+0x8c/0xc8
[    7.001777]  __arm64_sys_finit_module+0x1c4/0x29c
[    7.001784]  invoke_syscall+0x40/0xf4
[    7.001792]  el0_svc_common.constprop.1+0xc4/0xec
[    7.001814]  do_el0_svc+0x18/0x20
[    7.001820]  el0_svc+0x28/0x90
[    7.001826]  el0t_64_sync_handler+0x9c/0xc0
[    7.001845]  el0t_64_sync+0x160/0x164

I could have sworn that this was coming from
host1x_memory_context_list_init() but that is not the case.

Anyway, we have a test that checks for warnings/errors and this
is causing that test to fail. Even if this particular instance
of error is benign we would still like to trap any instances
that are not. So is there something we can fix here to avoid
this?

Thanks
Jon

--
nvpublic