Re: [bug report] IOMMU reports data translation fault for fio testing

John Garry <john.garry@xxxxxxxxxx> · Mon, 16 May 2022 11:51:23 +0100

On 14/05/2022 10:49, John Garry wrote:
It could be an issue with the SCSI hba driver.

That seems likely to me.

Actually it is a LLDD problem. Sometimes it takes 45 minutes to trigger, 
though – not nice to bisect.

This looks to be the problematic patch:

author John Garry <john.garry@xxxxxxxxxx> 2022-02-10 18:43:24 +0800
committer Martin K. Petersen <martin.petersen@xxxxxxxxxx> 2022-02-11 
17:02:50 -0500
commit 26fc0ea74fcb9b76b41f5e9b89728cd1c01559cd (patch)
scsi: libsas: Drop SAS_TASK_AT_INITIATOR

If interested, this looks like the issue:

void hisi_sas_task_deliver(struct hisi_hba *hisi_hba,
break;
}

- spin_lock_irqsave(&task->task_state_lock, flags);
- task->task_state_flags |= SAS_TASK_AT_INITIATOR;
- spin_unlock_irqrestore(&task->task_state_lock, flags);
-
WRITE_ONCE(slot->ready, 1);

Losing the spinlock loses the barrier semantics as well, so a memory 
ordering issue.

Sure, that would be common wisdom. However the commit before anything 
related to driver was added for 5.18 is also bad. It could be 
pre-existing, but that starts to seem unlikely. Or it could still be an 
IOMMU issue - we already have a performance issue there.

This issue can take more than 15 minutes to occur, so is pretty painful 
to bisect...