This series modifies current implementation to use 128-bit cmpxchg to update DTE when needed as specified in the AMD I/O Virtualization Techonology (IOMMU) Specification. Please note that I have verified with the hardware designer, and they have confirmed that the IOMMU hardware has always been implemented with 256-bit read. The next revision of the IOMMU spec will be updated to correctly describe this part. Therefore, I have updated the implementation to avoid unnecessary flushing. Also, this has been a long series. I would like to thank several folks who have helped review and provide suggestions along the way :) Changes in v11: * Remove the patch to introduce __READ_ONCE() for 128-bit data type since all 128-bit DTE access is currently done under per-DTE spin_lock. This is to help avoid complicating __unqual_scalar_typeof() further (Per Arnd). * Patch 4, 6: - Replace spin_lock/unlock() with spin_lock_irqsave/spin_unlock_irqrestore() due to the following dmesg warning: ===================================================== WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected 6.12.0-rc5+ #29 Not tainted ----------------------------------------------------- cc1/145047 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: ffff93737e255cb8 (&dev_data->dte_lock){+.+.}-{2:2}, at: update_dte256+0x5f/0x1b0 and this task is already holding: ffff937371782150 (&domain->lock){-.-.}-{2:2}, at: alloc_pte.constprop.0+0x175/0x5e0 which would create a new lock dependency: (&domain->lock){-.-.}-{2:2} -> (&dev_data->dte_lock){+.+.}-{2:2} but this new dependency connects a HARDIRQ-irq-safe lock: (&domain->lock){-.-.}-{2:2} ... which became HARDIRQ-irq-safe at: __lock_acquire+0x399/0xbb0 lock_acquire.part.0+0xb0/0x250 __raw_spin_lock_irqsave+0x49/0x90 amd_iommu_flush_iotlb_all+0x1b/0x50 fq_flush_iotlb+0x22/0x40 queue_iova+0x12d/0x150 __iommu_dma_unmap+0xc2/0x140 iommu_dma_unmap_page+0x44/0x90 dma_unmap_page_attrs+0x202/0x240 nvme_pci_complete_batch+0xb3/0xd0 [nvme] nvme_irq+0x7f/0x90 [nvme] __handle_irq_event_percpu+0x81/0x270 handle_irq_event+0x34/0x70 handle_edge_irq+0x9f/0x240 __common_interrupt+0x70/0x140 common_interrupt+0xb2/0xd0 asm_common_interrupt+0x22/0x40 cpuidle_enter_state+0x11d/0x540 cpuidle_enter+0x29/0x40 cpuidle_idle_call+0x100/0x170 do_idle+0x96/0xf0 cpu_startup_entry+0x25/0x30 start_secondary+0x11d/0x140 common_startup_64+0x13e/0x141 to a HARDIRQ-irq-unsafe lock: (&dev_data->dte_lock){+.+.}-{2:2} ... which became HARDIRQ-irq-unsafe at: ... __lock_acquire+0x399/0xbb0 lock_acquire.part.0+0xb0/0x250 _raw_spin_lock+0x34/0x80 update_dte256+0x5f/0x1b0 set_dte_entry+0x1d1/0x290 dev_update_dte+0x53/0x120 attach_device.isra.0+0x120/0x4f0 amd_iommu_attach_device+0x83/0xd0 __iommu_attach_device+0x1d/0xd0 __iommu_device_set_domain+0x5b/0xb0 __iommu_group_set_domain_internal+0x68/0x120 iommu_setup_default_domain+0x204/0x350 iommu_device_register+0x156/0x250 iommu_init_pci+0x18f/0x570 amd_iommu_init_pci+0xcb/0x2b0 state_next+0x7e5/0x8e0 amd_iommu_init+0x1f/0x80 pci_iommu_init+0xe/0x40 do_one_initcall+0x5f/0x2c0 do_initcalls+0xb9/0x180 kernel_init_freeable+0x149/0x230 kernel_init+0x16/0x1c0 ret_from_fork+0x2d/0x50 ret_from_fork_asm+0x1a/0x30 * Patch 4: - Introduce helper function iommu_atomic128_set() (Per Uros) - In write_dte_update128(), remove unnecessary do-while() loop for try-cmpxchg and remove __READ_ONCE() since the function is called under DTE spin_lock; * Patch 6: In get_dte256(), remove __READ_ONCE(), since the 128-bit data read is inside DTE spin_lock. * Patch 8: Remove READ/WRITE_ONCE() from amd_iommu_set_dirty_tracking() since called inside DTE spin_lock. v10: https://lore.kernel.org/lkml/20241113120327.5239-1-suravee.suthikulpanit@xxxxxxx/ v9: https://lore.kernel.org/lkml/20241101162304.4688-1-suravee.suthikulpanit@xxxxxxx/ v8: https://lore.kernel.org/lkml/20241031184243.4184-1-suravee.suthikulpanit@xxxxxxx/ v7: https://lore.kernel.org/lkml/20241031091624.4895-1-suravee.suthikulpanit@xxxxxxx/ v6: https://lore.kernel.org/lkml/20241016051756.4317-1-suravee.suthikulpanit@xxxxxxx/ v5: https://lore.kernel.org/lkml/20241007041353.4756-1-suravee.suthikulpanit@xxxxxxx/ v4: https://lore.kernel.org/lkml/20240916171805.324292-1-suravee.suthikulpanit@xxxxxxx/ v3: https://lore.kernel.org/lkml/20240906121308.5013-1-suravee.suthikulpanit@xxxxxxx/ v2: https://lore.kernel.org/lkml/20240829180726.5022-1-suravee.suthikulpanit@xxxxxxx/ v1: https://lore.kernel.org/lkml/20240819161839.4657-1-suravee.suthikulpanit@xxxxxxx/ Thanks, Suravee Suravee Suthikulpanit (9): iommu/amd: Misc ACPI IVRS debug info clean up iommu/amd: Disable AMD IOMMU if CMPXCHG16B feature is not supported iommu/amd: Introduce struct ivhd_dte_flags to store persistent DTE flags iommu/amd: Introduce helper function to update 256-bit DTE iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers iommu/amd: Introduce helper function get_dte256() iommu/amd: Modify clear_dte_entry() to avoid in-place update iommu/amd: Lock DTE before updating the entry with WRITE_ONCE() iommu/amd: Remove amd_iommu_apply_erratum_63() drivers/iommu/amd/amd_iommu.h | 4 +- drivers/iommu/amd/amd_iommu_types.h | 41 ++- drivers/iommu/amd/init.c | 229 +++++++++-------- drivers/iommu/amd/iommu.c | 378 +++++++++++++++++++++------- 4 files changed, 440 insertions(+), 212 deletions(-) -- 2.34.1