On Wed Oct 16 19, Qian Cai wrote:
After the commit 754265bcab78 ("iommu/amd: Fix race in increase_address_space()"), it could still possible trigger a race condition under some heavy memory pressure below. The race to trigger a warning is, CPU0: CPU1: in alloc_pte(): in increase_address_space(): while (address > PM_LEVEL_SIZE(domain->mode)) [1] spin_lock_irqsave(&domain->lock domain->mode += 1; spin_unlock_irqrestore(&domain->lock in increase_address_space(): spin_lock_irqsave(&domain->lock if (WARN_ON_ONCE(domain->mode == PAGE_MODE_6_LEVEL)) [1] domain->mode = 5 It is unclear the triggering of the warning is the root cause of the smartpqi offline yet, but let's fix it first by lifting the locking. WARNING: CPU: 57 PID: 124314 at drivers/iommu/amd_iommu.c:1474 iommu_map_page+0x718/0x7e0 smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0xffffffffffec0000 flags=0x0010] smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0xffffffffffec1000 flags=0x0010] CPU: 57 PID: 124314 Comm: oom01 Tainted: G O Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 RIP: 0010:iommu_map_page+0x718/0x7e0 Code: 88 a5 70 ff ff ff e9 5d fa ff ff 48 8b b5 70 ff ff ff 4c 89 ef e8 08 32 2f 00 41 80 fc 01 0f 87 b7 3d 00 00 41 83 e4 01 eb be <0f> 0b 48 8b b5 70 ff ff ff 4c 89 ef e8 e7 31 2f 00 eb dd 0f 0b 48 RSP: 0018:ffff888da4816cb8 EFLAGS: 00010046 RAX: 0000000000000000 RBX: ffff8885fe689000 RCX: ffffffff96f4a6c4 RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8885fe689124 RBP: ffff888da4816da8 R08: ffffed10bfcd120e R09: ffffed10bfcd120e R10: ffffed10bfcd120d R11: ffff8885fe68906b R12: 0000000000000000 smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0xffffffffffec1a00 flags=0x0010] R13: ffff8885fe689068 R14: ffff8885fe689124 R15: 0000000000000000 smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0xffffffffffec1e00 flags=0x0010] FS: 00007f29722ba700(0000) GS:ffff88902f880000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f27f82d8000 CR3: 000000102ed9c000 CR4: 00000000003406e0 Call Trace: smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0xffffffffffec2000 flags=0x0010] map_sg+0x1ce/0x2f0 smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0xffffffffffec2400 flags=0x0010] scsi_dma_map+0xd7/0x160 pqi_raid_submit_scsi_cmd_with_io_request+0x1b8/0x420 [smartpqi] smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0xffffffffffec2800 flags=0x0010] pqi_scsi_queue_command+0x8ab/0xe00 [smartpqi] smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0xffffffffffec2c00 flags=0x0010] scsi_queue_rq+0xd19/0x1360 smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0xffffffffffec3000 flags=0x0010] __blk_mq_try_issue_directly+0x295/0x3f0 smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0xffffffffffec3400 flags=0x0010] AMD-Vi: Event logged [IO_PAGE_FAULT device=23:00.0 domain=0x0000 address=0xffffffffffec3800 flags=0x0010] blk_mq_request_issue_directly+0xb5/0x100 AMD-Vi: Event logged [IO_PAGE_FAULT device=23:00.0 domain=0x0000 address=0xffffffffffec3c00 flags=0x0010] blk_mq_try_issue_list_directly+0xa9/0x160 blk_mq_sched_insert_requests+0x228/0x380 blk_mq_flush_plug_list+0x448/0x7e0 blk_flush_plug_list+0x1eb/0x230 blk_finish_plug+0x43/0x5d shrink_node_memcg+0x9c5/0x1550 smartpqi 0000:23:00.0: controller is offline: status code 0x14803 smartpqi 0000:23:00.0: controller offline Fixes: 754265bcab78 ("iommu/amd: Fix race in increase_address_space()") Signed-off-by: Qian Cai <cai@xxxxxx> --- BTW, Joerg, this line from the commit "iommu/amd: Remove domain->updated" looks suspicious. Not sure what the purpose of it. *updated = increase_address_space(domain, gfp) || *updated;
Looking at it again I think that isn't an issue really, it would just not lose updated being set in a previous loop iteration, but now I'm wondering about the loop itself. In the cases where it would return false, how does the evaluation of the condition for the while loop change?
drivers/iommu/amd_iommu.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 2369b8af81f3..a5754068aa29 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -1465,12 +1465,9 @@ static void free_pagetable(struct protection_domain *domain) static bool increase_address_space(struct protection_domain *domain, gfp_t gfp) { - unsigned long flags; bool ret = false; u64 *pte; - spin_lock_irqsave(&domain->lock, flags); - if (WARN_ON_ONCE(domain->mode == PAGE_MODE_6_LEVEL)) /* address space already 64 bit large */ goto out; @@ -1487,8 +1484,6 @@ static bool increase_address_space(struct protection_domain *domain, ret = true; out: - spin_unlock_irqrestore(&domain->lock, flags); - return ret; } @@ -1499,14 +1494,19 @@ static u64 *alloc_pte(struct protection_domain *domain, gfp_t gfp, bool *updated) { + unsigned long flags; int level, end_lvl; u64 *pte, *page; BUG_ON(!is_power_of_2(page_size)); + spin_lock_irqsave(&domain->lock, flags); + while (address > PM_LEVEL_SIZE(domain->mode)) *updated = increase_address_space(domain, gfp) || *updated; + spin_unlock_irqrestore(&domain->lock, flags); + level = domain->mode - 1; pte = &domain->pt_root[PM_LEVEL_INDEX(level, address)]; address = PAGE_SIZE_ALIGN(address, page_size); -- 1.8.3.1 _______________________________________________ iommu mailing list iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/iommu