On Wed, Dec 22, 2021 at 10:25:27AM +0800, Hillf Danton wrote: > > I'm not sure what you hope to learn by doing something like that. > > That will certainly perturb the system, but every 150 seconds, the > > task is going to let other tasks/threads run --- but it will be > > whatever is the next highest priority thread. > > Without reproducer, I am trying to reproduce the issue using a FIFO CPU hog > which is supposed to beat the watchdog to show me the victims like various > kthreads, workqueue workers and user apps, despite I know zero about how the > watchdog is configured except the report was down to watchdog bite. It's really trivial to reproduce an issue that has the same symptom as what has been reported to you. Mount the file system using a non-real-time (SCHED_OTHER) thread, such that the jbd2 and ext4 worker threads are running SCHED_OTHER. Then run some file system workload (fsstress or fsmark) as SCHED_FIFO. Then on an N CPU system, run N processes as SCHED_FIFO at any priority (doesn't matter whether it's MAX_PRI-1 or MIN_PRI; SCHED_FIFO will have priority over SCHED_OTHER processes, so this will effectively starve the ext4 and jbd2 worker threads from ever getting to run. Once the ext4 journal fills up, any SCHED_FIFO process which tries to write to the file system will hang. The problem is that's *one* potential stupid configuration of the real-time system. It's not necessarily the *only* potentially stupid way that you can get yourself into a system hang. It appears the syzkaller "repro" is another such "stupid way". And the number of ways you can screw up with a real-time system is practically unbounded... So getting back to syzkaller, Willy had the right approach, which is a Syzcaller "repro" happens to use SCHED_FIFO or SCHED_RR, and the symptom is a system hang, it's probably worth ignoring the report, since it's going to be a waste of time to debug userspace bug. If you have anything that uses kernel threads, and SCHED_FIFO or SCHED_RR is in play, it's probably a userspace bug. Cheers, - Ted