On Tue, May 21, 2019 at 03:46:03PM +0000, Trond Myklebust wrote: > Have you tried upgrading to 4.19.44? There is a fix that went in not > too long ago that deals with a request leak that can cause stack traces > like the above that wait forever. > Following up on this. I have set aside a rack of machines and put Linux 4.19.44 on them. They ran jobs overnight and will do the same over the long weekend (Memorial day in the US). Given the error rate (both over time and over submitted jobs) we see across the cluster this well be enough time to draw a conclusion as to whether 4.19.44 exhibits this hang. Other than stack traces, what kind of information could I collect that would be helpful for debugging or describing more precisely what is happening to these hosts? I'd like to exit from the condition of trying different kernels (as you no doubt saw in my initial message I've done a lot of it) and enter the condition of debugging or reproducing the problem. I'll report back early next week and appreciate your feedback, -A -- Alan Post | Xen VPS hosting for the technically adept PO Box 61688 | Sunnyvale, CA 94088-1681 | https://prgmr.com/ email: adp@xxxxxxxxx