Re: sriov dmar errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Interesting, after making this change ontop of commit 4be90bc the card
comes up:

diff --git a/drivers/infiniband/hw/mlx4/main.c
b/drivers/infiniband/hw/mlx4/main.c
index 1437ed5..820b481 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -423,8 +423,8 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,
            (dev->dev->rev_id != MLX4_IB_CARD_REV_A0) &&
            (dev->dev->caps.flags & MLX4_DEV_CAP_FLAG_BLH))
                props->device_cap_flags |= IB_DEVICE_UD_TSO;
-       if (dev->dev->caps.bmme_flags & MLX4_BMME_FLAG_RESERVED_LKEY)
-               props->device_cap_flags |= IB_DEVICE_LOCAL_DMA_LKEY;
+       //if (dev->dev->caps.bmme_flags & MLX4_BMME_FLAG_RESERVED_LKEY)
+       //      props->device_cap_flags |= IB_DEVICE_LOCAL_DMA_LKEY;
        if ((dev->dev->caps.bmme_flags & MLX4_BMME_FLAG_LOCAL_INV) &&
            (dev->dev->caps.bmme_flags & MLX4_BMME_FLAG_REMOTE_INV) &&
            (dev->dev->caps.bmme_flags & MLX4_BMME_FLAG_FAST_REG_WR))

There are ConnectX2 cards in the particular system I'm working on and
even without this change the ConnectX3 cards work just fine so perhaps
this is ultimately a firmware issue with the CX2 cards (which, I know,
are old, but when you have 1200 of them...) interacting with whatever
the "local dma lkey" feature is.

Still, any thoughts would be great.

-Aaron

On 1/10/18 9:52 PM, Aaron Knister wrote:
> It looks like commit 4be90bc (IB/mad: Remove ib_get_dma_mr calls
> ) broke SR-IOV usage with some of our older Mellanox IB cards. After
> loading the mlx4_core module with some number of vfs specified (e.g.
> modprobe mlx4_core num_vfs=16) the following errors appear and the
> port's state never changes from INIT to ACTIVE:
> 
> [ 1946.325804] DMAR: DRHD: handling fault status reg 302
> [ 1946.331442] DMAR: DMAR:[DMA Write] Request device [16:06.1] fault
> addr fcb6e000
> DMAR:[fault reason 02] Present bit in context entry is clear
> [ 1946.525788] DMAR: DRHD: handling fault status reg 402
> [ 1946.531416] DMAR: DMAR:[DMA Write] Request device [16:06.1] fault
> addr fcb6d000
> DMAR:[fault reason 02] Present bit in context entry is clear
> 
> If I revert to the parent commit (96249d7) the HCA and virtual functions
> all initialize as expected.
> 
> I'm not sure where to turn next to debug the issue further. Any tips?
> 

-- 
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux