Hi Jason, On 09.08.2023 16:43, Jason Gunthorpe wrote: > I missed two paths where __iommu_probe_device() can be called while > already holding the device_lock() for the device that is to be probed. > > This causes a deadlock because __iommu_probe_device() will attempt to > re-acquire the lock. > > Organize things so that these two paths can re-use the existing already > held device_lock by marking the call chains based on if the lock is held > or not. > > Also the order of iommu_init_device() was not correct for > generic_single_device_group() I've just noticed that there is still one more issue left to fix after applying this patchset. On Qualcomm's RB5 board I get the following warning (captured on vanilla next-20230817): ------------[ cut here ]------------ WARNING: CPU: 6 PID: 274 at include/linux/device.h:1012 iommu_probe_device_locked+0xd4/0xe4 Modules linked in: apr pdr_interface venus_dec venus_enc videobuf2_dma_contig rpmsg_ctrl fastrpc qrtr_smd rpmsg_char lontium_lt9611uxc msm mcp251xfd qcom_camss can_dev videobuf2_dma_sg ocmem imx412 gpu_sched venus_core snd_soc_sm8250 phy_qcom_qmp_combo videobuf2_memops v4l2_fwnode leds_qcom_lpg v4l2_mem2mem drm_dp_aux_bus v4l2_async snd_soc_qcom_sdw videobuf2_v4l2 ax88179_178a onboard_usb_hub videodev rtc_pm8xxx qcom_spmi_adc5 qcom_pon qcom_spmi_adc_tm5 led_class_multicolor qcom_spmi_temp_alarm qcom_vadc_common crct10dif_ce i2c_qcom_geni snd_soc_qcom_common qrtr drm_display_helper videobuf2_common camcc_sm8250 mc typec i2c_qcom_cci spi_geni_qcom qcom_stats llcc_qcom phy_qcom_qmp_usb qcom_rng icc_bwmon qcom_q6v5_pas qcrypto phy_qcom_snps_femto_v2 qcom_pil_info sha256_generic soundwire_qcom coresight_stm coresight_tmc stm_core coresight_funnel coresight_replicator libsha256 qcom_q6v5 pinctrl_sm8250_lpass_lpi soundwire_bus snd_soc_lpass_va_macro snd_soc_lpass_macro_common pinctrl_lpass_lpi coresight authenc lpass_gfm_sm8250 qcom_sysmon slimbus snd_soc_lpass_wsa_macro libdes qcom_common qcom_glink_smem socinfo mdt_loader qmi_helpers phy_qcom_qmp_pcie icc_osm_l3 qcom_wdt display_connector ip_tables x_tables ipv6 CPU: 6 PID: 274 Comm: kworker/u16:8 Not tainted 6.5.0-rc6-next-20230817 #13867 Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT) Workqueue: events_unbound deferred_probe_work_func pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : iommu_probe_device_locked+0xd4/0xe4 lr : iommu_probe_device_locked+0x78/0xe4 ... Call trace: iommu_probe_device_locked+0xd4/0xe4 of_iommu_configure+0x10c/0x200 of_dma_configure_id+0x104/0x3b8 a6xx_gmu_init+0x4c/0xccc [msm] a6xx_gpu_init+0x3ac/0x770 [msm] adreno_bind+0x174/0x2ac [msm] component_bind_all+0x118/0x24c msm_drm_bind+0x1e8/0x6c4 [msm] try_to_bring_up_aggregate_device+0x168/0x1d4 __component_add+0xa8/0x170 component_add+0x14/0x20 dsi_dev_attach+0x20/0x2c [msm] dsi_host_attach+0x9c/0x144 [msm] devm_mipi_dsi_attach+0x34/0xb4 lt9611uxc_attach_dsi.isra.0+0x84/0xfc [lontium_lt9611uxc] lt9611uxc_probe+0x5c8/0x68c [lontium_lt9611uxc] i2c_device_probe+0x14c/0x290 really_probe+0x148/0x2b4 __driver_probe_device+0x78/0x12c driver_probe_device+0x3c/0x160 __device_attach_driver+0xb8/0x138 bus_for_each_drv+0x84/0xe0 __device_attach+0xa8/0x1b0 device_initial_probe+0x14/0x20 bus_probe_device+0xb0/0xb4 deferred_probe_work_func+0x8c/0xc8 process_one_work+0x1ec/0x53c worker_thread+0x298/0x408 kthread+0x124/0x128 ret_from_fork+0x10/0x20 irq event stamp: 207712 hardirqs last enabled at (207711): [<ffffcfe16b46b160>] _raw_spin_unlock_irqrestore+0x74/0x78 hardirqs last disabled at (207712): [<ffffcfe16b458630>] el1_dbg+0x24/0x8c softirqs last enabled at (206480): [<ffffcfe16a290a10>] __do_softirq+0x438/0x4ec softirqs last disabled at (206473): [<ffffcfe16a296980>] ____do_softirq+0x10/0x1c ---[ end trace 0000000000000000 ]--- ------------[ cut here ]------------ Let me know if you need more information or some tests on the hardware. > > v2: > - New patch to correct the order of setting iommu_dev during > iommu_init_device() > - Spelling fixes > - Simply block probing the iommu device itself instead of trying to do it > v1: https://lore.kernel.org/r/0-v1-8612b9ef48da+333-iommu_group_locking2_jgg@xxxxxxxxxx > > Jason Gunthorpe (4): > iommu: Provide iommu_probe_device_locked() > iommu: Pass in the iommu_device to probe for in bus_iommu_probe() > iommu: Do not attempt to re-lock the iommu device when probing > iommu: dev->iommu->iommu_dev must be set before ops->device_group() > > drivers/acpi/scan.c | 5 +-- > drivers/iommu/iommu.c | 65 +++++++++++++++++++++++++++----------- > drivers/iommu/of_iommu.c | 2 +- > drivers/iommu/omap-iommu.c | 12 +++++-- > include/linux/iommu.h | 6 +++- > 5 files changed, 65 insertions(+), 25 deletions(-) > > > base-commit: 8d3740021d5d461e1ec4c17fc5625b9b4237c2f8 Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland