On Wed, 2018-01-10 at 12:15 -0700, Jason Gunthorpe wrote: > On Wed, Jan 10, 2018 at 01:59:10PM -0500, Laurence Oberman wrote: > > > Yep, this seems specific to the mlx5 and IB. > > The problem though is Linus's tree 4.15-rc-7 already has enough of > > the > > part of the RDMA updates to see issues. > > Every time you post a backtrace it is different.. The only > commonality > seems to be that the CQ completion core appears to be processing > garbage, accompanied by these sorts of sketch kernel messages from > mlx5: > > > [ 1360.511682] mlx5_core 0000:08:00.1: Shutdown was called > > [ 1360.550531] mlx5_core 0000:08:00.1: > > mlx5_enter_error_state:121:(pid > > [ 938.938946] mlx5_core 0000:08:00.1: Shutdown was called > > [ 938.968423] mlx5_core 0000:08:00.1: > > mlx5_cmd_force_teardown_hca:245:(pid 14752): teardown with force > > mode failed > > [ 938.978359] mlx5_core 0000:08:00.1: > > mlx5_cmd_comp_handler:1445:(pid 13186): Command completion arrived > > after timeout (entry idx = 0). > > [ 942.209464] mlx5_1:wait_for_async_commands:735:(pid 14752): done > > with all pending requests > > My other guess is a mlx5 issue where it is returning CQ wrids it > should not return? > > Leon? > > I don't see anything changing in this area in rdma.git for-rc, so I > can't give you a guess on a patch, sorry. > > Do you think this test ever worked for you? You said bisect, so I > assume so? > > Jason Hi Jason Just to be clear, I have posted two types of stack traces, one where I panic the other here above where I am not panicking. This is not any special type of test. I booted the kernel, mapped the SRP devices from the target server and proceeded to shutdown the client with shutdown -r now. This is part of my holistic test I always do against new patches in Bart's tree. I start with reboots, them rmmod's etc. before I go on to perform I/O against the LUNS from the target. The panic was the first issue I came across after building a kernel with Bart's tree. I have not even started testing anything else yet. The trace above was provided because Bart asked me to test two kernels, 1. Linus's tree 4.15-rc7 2. The RDMA tree. Bart's Tree panics the same as the RDMA tree I cloned. I will look at prior release candidates in Linus's tree and see where this maybe crept in. I am of course puzzled why I am the only one to see it, other folks must have MLX5 (CX4) like I do. Would be good to know what test was last performed on the current RDMA tree by Leon and team. Regards Laurence -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html