On 5/17/22 17:21, Bob Pearson wrote:
Thanks Bart. I was able to follow your steps above. But unfortunately not much has changed. I still see hangs in siw (with no code changes by me) and also in rxe (but here I have fixed some lockdep warnings so it will run in a debug kernel.) There are two test cases that cause the most problems. srp/002 and srp/011. 011 always fails solidly. 002 sometimes hangs and sometimes completes but with failed status. The rest of the tests all pass. Both tests hang at a line that looks like scsi host6: ib_srp: Already connected to target port with id_ext=... When 002 completes but fails there are 14 second gaps at some of the same lines in the trace. This has the feel of the earlier problem with the 3 minute timeout that was fixed by the patch (revert ... scsi_debug.c) that you sent and is applied here. I really don't know how to make progress here. If anyone knows what is happening at the already connected lines let me know. They seem normal except for the long gaps and hangs when they occur.
How about sharing the kernel config file that you are using in your tests such that I can try to reproduce the behavior that you are observing?
Thanks, Bart.