> On Feb 12, 2020, at 2:30 PM, Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > On Wed, Feb 12, 2020 at 02:09:03PM -0500, Chuck Lever wrote: >> >> >>> On Feb 12, 2020, at 2:05 PM, Jason Gunthorpe <jgg@xxxxxxxx> wrote: >>> >>> On Wed, Feb 12, 2020 at 01:38:59PM -0500, Chuck Lever wrote: >>>> >>>>> On Feb 12, 2020, at 1:26 PM, Jason Gunthorpe <jgg@xxxxxxxx> wrote: >>>>> >>>>> On Wed, Feb 12, 2020 at 11:12:30AM -0500, Chuck Lever wrote: >>>>>> The @nents value that was passed to ib_dma_map_sg() has to be passed >>>>>> to the matching ib_dma_unmap_sg() call. If ib_dma_map_sg() choses to >>>>>> concatenate sg entries, it will return a different nents value than >>>>>> it was passed. >>>>>> >>>>>> The bug was exposed by recent changes to the AMD IOMMU driver, which >>>>>> enabled sg entry concatenation. >>>>>> >>>>>> Looking all the way back to commit 4143f34e01e9 ("xprtrdma: Port to >>>>>> new memory registration API") and reviewing other kernel ULPs, it's >>>>>> not clear that the frwr_map() logic was ever correct for this case. >>>>>> >>>>>> Reported-by: Andre Tomt <andre@xxxxxxxx> >>>>>> Suggested-by: Robin Murphy <robin.murphy@xxxxxxx> >>>>>> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> >>>>>> Cc: stable@xxxxxxxxxxxxxxx # v5.5 >>>>>> net/sunrpc/xprtrdma/frwr_ops.c | 13 +++++++------ >>>>>> 1 file changed, 7 insertions(+), 6 deletions(-) >>>>> >>>>> Yep >>>>> >>>>> Reviewed-by: Jason Gunthorpe <jgg@xxxxxxxxxxxx> >>>> >>>> Thanks. >>>> >>>> Wondering if it makes sense to add a Fixes tag for the AMD IOMMU commit >>>> where NFS/RDMA stopped working, rather than the "Cc: stable # v5.5". >>>> >>>> Fixes: be62dbf554c5 ("iommu/amd: Convert AMD iommu driver to the dma-iommu api") >>> >>> Not really, this was broken for other configurations besides AMD >> >> Agreed, but the bug seems to have been inconsequential until now? > > I imagine it would get you on ARM or other archs, IIRC. That's certainly plausible, but I haven't received explicit bug reports in this area. (I'm not at all saying that such bugs categorically do not exist). In any event, practical matters: the posted patch applies back to v5.4, but fails to apply starting with v5.3. I think we can leave the "Cc: stable # v5.5"; and I'm open to requests to backport this simple fix onto earlier stable kernels (back to v4.4), which can be handled case-by-case. 'Salright? -- Chuck Lever