Re: kernel NULL pointer during reset_controller operation with IO on 4.11.0-rc7

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




I couldn't repro it, but for some reason you got an overflow in the QP send queue. seems like something might be wrong with the calculation (probably signaling calculation).

please supply more details:
1. link layer ?
2. HCA type + FW versions on target/host sides ?
3. B2B connection ?

try this one as a first step:

Hi Max
I retest this issue on 4.13.0-rc6/4.13.0-rc7 without your patch, found this issue cannot be reproduced any more.
Here is my environment:
link layer:mlx5_roce
HCA:
04:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4] 04:00.1 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4] 05:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx] 05:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
Firmware:
[   13.489854] mlx5_core 0000:04:00.0: firmware version: 12.18.1000
[   14.360121] mlx5_core 0000:04:00.1: firmware version: 12.18.1000
[   15.091088] mlx5_core 0000:05:00.0: firmware version: 14.18.1000
[   15.936417] mlx5_core 0000:05:00.1: firmware version: 14.18.1000
The two server connected by switch.

Will let you know and retest your patch when I reproduced it in the future.

Thanks
Yi

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 82fcb07..1437306 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -88,6 +88,7 @@ struct nvme_rdma_queue {
        struct nvme_rdma_qe     *rsp_ring;
        atomic_t                sig_count;
        int                     queue_size;
+       int                     limit_mask;
        size_t                  cmnd_capsule_len;
        struct nvme_rdma_ctrl   *ctrl;
        struct nvme_rdma_device *device;
@@ -521,6 +522,7 @@ static int nvme_rdma_init_queue(struct nvme_rdma_ctrl *ctrl,

        queue->queue_size = queue_size;
        atomic_set(&queue->sig_count, 0);
+ queue->limit_mask = (min(32, 1 << ilog2((queue->queue_size + 1) / 2))) - 1;

queue->cm_id = rdma_create_id(&init_net, nvme_rdma_cm_handler, queue,
                        RDMA_PS_TCP, IB_QPT_RC);
@@ -1009,9 +1011,7 @@ static void nvme_rdma_send_done(struct ib_cq *cq, struct ib_wc *wc)
  */
static inline bool nvme_rdma_queue_sig_limit(struct nvme_rdma_queue *queue)
 {
-       int limit = 1 << ilog2((queue->queue_size + 1) / 2);
-
-       return (atomic_inc_return(&queue->sig_count) & (limit - 1)) == 0;
+ return (atomic_inc_return(&queue->sig_count) & (queue->limit_mask)) == 0;
 }

 static int nvme_rdma_post_send(struct nvme_rdma_queue *queue,




_______________________________________________
Linux-nvme mailing list
Linux-nvme@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/linux-nvme

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux