[PATCH] dma/debug: Fix dma vs cow-page collision detection

Dan Williams <dan.j.williams@xxxxxxxxx> · Tue, 19 Nov 2019 09:35:38 -0800

The debug_dma_assert_idle() infrastructure was put in place to catch a
data corruption scenario first identified by the now defunct NET_DMA
receive offload feature. It caught cases where dma was in flight to a
stale page because the dma raced the cpu writing the page, and the cpu
write triggered cow_user_page().

However, the dma-debug tracking is overeager and also triggers in cases
where the dma device is reading from a page that is also undergoing
cow_user_page().

The fix proposed was originally posted in 2016, and Russell reported
"Yes, that seems to avoid the warning for me from an initial test", and
now Don is also reporting that this fix is addressing a similar false
positive report that he is seeing.

Link: https://lore.kernel.org/r/CAPcyv4j8fWqwAaX5oCdg5atc+vmp57HoAGT6AfBFwaCiv0RbAQ@xxxxxxxxxxxxxx
Reported-by: Russell King <linux@xxxxxxxxxxxxxxx>
Reported-by: Don Dutile <ddutile@xxxxxxxxxx>
Fixes: 0abdd7a81b7e ("dma-debug: introduce debug_dma_assert_idle()")
Cc: <stable@xxxxxxxxxxxxxxx>
Cc: Christoph Hellwig <hch@xxxxxx>
Cc: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
Cc: Robin Murphy <robin.murphy@xxxxxxx>
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
---
 kernel/dma/debug.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index 099002d84f46..11a6db53d193 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -587,7 +587,7 @@ void debug_dma_assert_idle(struct page *page)
 	}
 	spin_unlock_irqrestore(&radix_lock, flags);
 
-	if (!entry)
+	if (!entry || entry->direction != DMA_FROM_DEVICE)
 		return;
 
 	cln = to_cacheline_number(entry);