Wei,
I'm attempting to port 378efe798ecf "RDMA/hns: Get rid of page
operation after dma_alloc_coherent" to the mlx4 driver.
One gotcha is that mlx4 has two alternative allocation/free paths; one
using dma_alloc/free_coherent, and the other using
alloc/__free_pages. I attempted to edit the pages path for mlx4 in a
similar way to your edit to the dma coherent path:
@@ -61,7 +61,7 @@ static void mlx4_free_icm_pages(struct mlx4_dev *dev, struct mlx4_icm_chunk *chu
for (i = 0; i < chunk->npages; ++i)
__free_pages(sg_page(&chunk->mem[i]),
- get_order(chunk->mem[i].length));
+ get_order(sg_dma_len(&chunk->mem[i])));
}
-static int mlx4_alloc_icm_pages(struct scatterlist *mem, int order,
+static int mlx4_alloc_icm_pages(struct mlx4_icm_chunk *chunk, int order,
gfp_t gfp_mask, int node)
{
+ struct scatterlist *mem = &chunk->mem[chunk->npages];
+ void **buf = &chunk->buf[chunk->npages];
struct page *page;
page = alloc_pages_node(node, gfp_mask, order);
@@ -107,24 +109,29 @@ static int mlx4_alloc_icm_pages(struct scatterlist *mem, int order,
}
sg_set_page(mem, page, PAGE_SIZE << order, 0);
+ sg_dma_len(mem) = PAGE_SIZE << order;
+ *buf = page_address(page);
return 0;
}
However, this causes a crash in mlx4_free_icm_pages(). I find that
while mem->length and sg_dma_len(mem) are equal after
mlx4_alloc_icm_pages(), the value of sg_dma_len(mem) has changed (in
some but not all cases) by the time mlx4_free_icm_pages() executes.
This is true if I do nothing but boot the system, which loads the
driver, then shutdown the system without using the driver at all.
Now, I could just remove the diff to mlx4_free_icm_pages() to
solve/hide this, but I suspect there's a bug here that needs to be
fixed. In particular, your change modified hns_roce_table_find() to
use sg_dma_len() rather than using the sg length field directly, and I
did similar to the mlx4 driver, so I wonder if that code isn't also
going to be impacted by the varying dma_length value.
Do you have any ideas what might be going on?
I'm running on an ARM64 platform with IOMMU enabled, and I do see some
code in drivers/iommu/dma-iommu.c that edits the sg dma_length field,
but I think that's only to coalesce contiguous entries or in DMA
mapping error cases, which I hope I'm not hitting; there's no hint I'm
hitting any error case.
In case it's useful, my complete mlx4 patch is available at:
https://pastebin.com/JLCpJG6s
(it's based on downstream 4.14 kernel, but applies upstream with a
tiny conflict resolution. Yes I'll upstream the fix once I have it
working in my old kernel.)
Thanks for any help.