From: Zhu Yanjun <yanjun.zhu@xxxxxxxxxx> Date: Sun, 15 Apr 2018 21:02:07 -0400 > While a faulty cable is used or HCA firmware error, HCA device will > be offline. When the driver is accessing this offline device, the > following call trace will pop out. ... > In the above call trace, the function mlx4_cmd_poll calls the function > mlx4_cmd_post to access the HCA while HCA is offline. Then mlx4_cmd_post > returns an error -EIO. Per -EIO, the function mlx4_cmd_poll calls > mlx4_cmd_reset_flow to reset HCA. And the above call trace pops out. > > This is not reasonable. Since HCA device is offline when it is being > accessed, it should not be reset again. > > In this patch, since HCA is offline, the function mlx4_cmd_post returns > an error -EINVAL. Per -EINVAL, the function mlx4_cmd_poll directly returns > instead of resetting HCA. > > CC: Srinivas Eeda <srinivas.eeda@xxxxxxxxxx> > CC: Junxiao Bi <junxiao.bi@xxxxxxxxxx> > Suggested-by: Håkon Bugge <haakon.bugge@xxxxxxxxxx> > Signed-off-by: Zhu Yanjun <yanjun.zhu@xxxxxxxxxx> Tariq, I'm assuming you'll take this in and send it to me later. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html