Wei,
I'm attempting to port 378efe798ecf "RDMA/hns: Get rid of page operation
after dma_alloc_coherent" to the mlx4 driver.
One gotcha is that mlx4 has two alternative allocation/free paths; one
using dma_alloc/free_coherent, and the other using alloc_/__free_pages.
I attempted to edit the pages path for mlx4 in a similar way to your
edit to the dma coherent path:
@@ -61,7 +61,7 @@ static void mlx4_free_icm_pages(struct mlx4_dev *dev, struct mlx4_icm_chunk *chu
for (i = 0; i < chunk->npages; ++i)
__free_pages(sg_page(&chunk->mem[i]),
- get_order(chunk->mem[i].length));
+ get_order(sg_dma_len(&chunk->mem[i])));
}
-static int mlx4_alloc_icm_pages(struct scatterlist *mem, int order,
+static int mlx4_alloc_icm_pages(struct mlx4_icm_chunk *chunk, int order,
gfp_t gfp_mask, int node)
{
+ struct scatterlist *mem = &chunk->mem[chunk->npages];
+ void **buf = &chunk->buf[chunk->npages];
struct page *page;
page = alloc_pages_node(node, gfp_mask, order);
@@ -107,24 +109,29 @@ static int mlx4_alloc_icm_pages(struct scatterlist *mem, int order,
}
sg_set_page(mem, page, PAGE_SIZE << order, 0);
+ sg_dma_len(mem) = PAGE_SIZE << order;
+ *buf = page_address(page);
return 0;
}
However, this causes a crash in mlx4_free_icm_pages(). I find that while
mem->length, sg_dma_len(mem) are equal after mlx4_alloc_icm_pages(), the
value of sg_dma_len(mem) has changed (in some but not all cases) by the
time mlx4_free_icm_pages() executes. This is true if I do nothing but
boot the system, which loads the driver, then shutdown the system
without using the driver at all.
Now, I could just remove the diff to mlx4_free_icm_pages() to solve/hide
this, but I suspect there's a bug here that needs to be fixed. In
particular, your change modified hns_roce_table_find() to use
sg_dma_len() rather than using the sg length field directly, and I did
the same to the mlx4 driver, so I wonder if that code isn't also going
to be impacted by the varying dma_length value.
Do you have any ideas what might be going on?
I'm running on an ARM64 platform with IOMMU enabled, and I do see some
code in drivers/iommu/dma-iommu.c that edits the sg dma_length field,
but I think that's only to coalesce contiguous entries or in DMA mapping
error cases, which I hope I'm not hitting; there's no hint I'm hitting
any error case.
In case it's useful, my complete mlx4 patch is available at:
https://pastebin.com/JLCpJG6s
(it's based on downstream 4.14 kernel, but applies upstream with a tiny
conflict resolution. Yes I'll upstream the fix once I have it working in
my old kernel.)
Thanks for any help.