On Wed, Feb 12, 2020 at 5:20 PM Salman Qazi <sqazi@xxxxxxxxxx> wrote: > > On Wed, Feb 12, 2020 at 3:07 PM Theodore Y. Ts'o <tytso@xxxxxxx> wrote: > > > > This is a problem we've been strugging with in other contexts. For > > example, if you have the hung task timer set to 2 minutes, and the > > system to panic if the hung task timer exceeds that, and an NFS server > > which the client is writing to crashes, and it takes longer for the > > NFS server to come back, that might be a situation where we might want > > to exempt the hung task warning from panic'ing the system. On the > > other hand, if the process is failing to schedule for other reasons, > > maybe we would still want the hung task timeout to go off. > > > > So I've been meditating over whether the right answer is to just > > globally configure the hung task timer to something like 5 or 10 > > minutes (which would require no kernel changes, yay?), or have some > > way of telling the hung task timeout logic that it shouldn't apply, or > > should have a different timeout, when we're waiting for I/O to > > complete. > > The problem that I anticipate in our space is that a generous timeout > will make impatient people reboot their chromebooks, losing us > information > about hangs. But, this can be worked around by having multiple > different timeouts. For instance, a thread that is expecting to do > something slow, can set a flag > to indicate that it wishes to be held against the more generous > criteria. This is something I am tempted to do on older kernels where > we might not feel > comfortable backporting io_uring. I was going to reply along the same lines when I got distracted by a mtg. If anything I'd like to see a LOWER hung task timeout, generally speaking. And maybe that means having more operations be asynchronous like Ted suggests (I'm generally a fan of that anyway). [snipped good suggestion about async interface] Jesse