> On 1. Oct 2024, at 02:56, Chris Mason <clm@xxxxxxxx> wrote: > > Not disagreeing with Linus at all, but given that you've got IO > throttling too, we might really just be waiting. It's hard to tell > because the hung task timeouts only give you information about one process. > > I've attached a minimal version of a script we use here to show all the > D state processes, it might help explain things. The only problem is > you have to actually ssh to the box and run it when you're stuck. > > The idea is to print the stack trace of every D state process, and then > also print out how often each unique stack trace shows up. When we're > deadlocked on something, there are normally a bunch of the same stack > (say waiting on writeback) and then one jerk sitting around in a > different stack who is causing all the trouble. I think I should be able to trigger this. I’ve seen around a 100 of those issues over the last week and the chance of it happening correlates with a certain workload that should be easy to trigger. Also, the condition remains for at around 5 minutes, so I should be able to trace it when I see the alert in an interactive session. I’ve verified I can run your script and I’ll get back to you in the next days. Christian -- Christian Theune · ct@xxxxxxxxxxxxxxx · +49 345 219401 0 Flying Circus Internet Operations GmbH · https://flyingcircus.io Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick