Re: mlx4_core 0000:07:00.0: swiotlb buffer is full and OOM observed during stress test on reset_controller

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 03/13/2017 02:16 AM, Max Gurtovoy wrote:


On 3/10/2017 6:52 PM, Leon Romanovsky wrote:
On Thu, Mar 09, 2017 at 12:20:14PM +0800, Yi Zhang wrote:

I'm using CX5-LX device and have not seen any issues with it.

Would it be possible to retest with kmemleak?

Here is the device I used.

Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]

The issue always can be reproduced with about 1000 time.

Another thing is I found one strange phenomenon from the log:

before the OOM occurred, most of the log are  about "adding queue", and
after the OOM occurred, most of the log are about "nvmet_rdma: freeing
queue".

seems the release work: "schedule_work(&queue->release_work);" not executed
timely, not sure whether the OOM is caused by this reason.

Sagi,
The release function is placed in global workqueue. I'm not familiar
with NVMe design and I don't know all the details, but maybe the proper way will be to create special workqueue with MEM_RECLAIM flag to ensure the progress?


Hi,

I was able to repro it in my lab with ConnectX3. added a dedicated workqueue with high priority but the bug still happens. if I add a "sleep 1" after echo 1 >/sys/block/nvme0n1/device/reset_controller the test pass. So there is no leak IMO, but the allocation process is much faster than the destruction of the resources. In the initiator we don't wait for RDMA_CM_EVENT_DISCONNECTED event after we call rdma_disconnect, and we try to connect immediatly again. maybe we need to slow down the storm of connect requests from the initiator somehow to let the target time to settle up.

Max.


Hi Sagi
Let's use this mail loop to track the OOM issue. :)

Thanks
Yi

Here is the log before/after OOM
http://pastebin.com/Zb6w4nEv

_______________________________________________
Linux-nvme mailing list
Linux-nvme@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/linux-nvme


_______________________________________________
Linux-nvme mailing list
Linux-nvme@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/linux-nvme

_______________________________________________
Linux-nvme mailing list
Linux-nvme@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/linux-nvme

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux