On Wed, 22 Dec 2021 at 05:35, Theodore Ts'o <tytso@xxxxxxx> wrote: > > On Wed, Dec 22, 2021 at 10:25:27AM +0800, Hillf Danton wrote: > > > I'm not sure what you hope to learn by doing something like that. > > > That will certainly perturb the system, but every 150 seconds, the > > > task is going to let other tasks/threads run --- but it will be > > > whatever is the next highest priority thread. > > > > Without reproducer, I am trying to reproduce the issue using a FIFO CPU hog > > which is supposed to beat the watchdog to show me the victims like various > > kthreads, workqueue workers and user apps, despite I know zero about how the > > watchdog is configured except the report was down to watchdog bite. > > It's really trivial to reproduce an issue that has the same symptom as > what has been reported to you. Mount the file system using a > non-real-time (SCHED_OTHER) thread, such that the jbd2 and ext4 worker > threads are running SCHED_OTHER. Then run some file system workload > (fsstress or fsmark) as SCHED_FIFO. Then on an N CPU system, run N > processes as SCHED_FIFO at any priority (doesn't matter whether it's > MAX_PRI-1 or MIN_PRI; SCHED_FIFO will have priority over SCHED_OTHER > processes, so this will effectively starve the ext4 and jbd2 worker > threads from ever getting to run. Once the ext4 journal fills up, any > SCHED_FIFO process which tries to write to the file system will hang. > > The problem is that's *one* potential stupid configuration of the > real-time system. It's not necessarily the *only* potentially stupid > way that you can get yourself into a system hang. It appears the > syzkaller "repro" is another such "stupid way". And the number of > ways you can screw up with a real-time system is practically > unbounded... > > So getting back to syzkaller, Willy had the right approach, which is a > Syzcaller "repro" happens to use SCHED_FIFO or SCHED_RR, and the > symptom is a system hang, it's probably worth ignoring the report, > since it's going to be a waste of time to debug userspace bug. If you > have anything that uses kernel threads, and SCHED_FIFO or SCHED_RR is > in play, it's probably a userspace bug. > > Cheers, Hi Ted, Reviving this old thread re syzkaller using SCHED_FIFO. It's a bit hard to restrict what the fuzzer can do if we give it sched_setattr() and friends syscalls. We could remove them from the fuzzer entirely, but it's probably suboptimal as well. I see that setting up SCHED_FIFO is guarded by CAP_SYS_NICE: https://elixir.bootlin.com/linux/v5.18-rc7/source/kernel/sched/core.c#L7264 And I see we drop CAP_SYS_NICE from the fuzzer process since 2019 (after a similar discussion): https://github.com/google/syzkaller/commit/f3ad68446455a The latest C reproducer contains: static void drop_caps(void) { struct __user_cap_header_struct cap_hdr = {}; struct __user_cap_data_struct cap_data[2] = {}; cap_hdr.version = _LINUX_CAPABILITY_VERSION_3; cap_hdr.pid = getpid(); if (syscall(SYS_capget, &cap_hdr, &cap_data)) exit(1); const int drop = (1 << CAP_SYS_PTRACE) | (1 << CAP_SYS_NICE); cap_data[0].effective &= ~drop; cap_data[0].permitted &= ~drop; cap_data[0].inheritable &= ~drop; if (syscall(SYS_capset, &cap_hdr, &cap_data)) exit(1); } Are we holding it wrong? How can the process manage to set any bad scheduling policies if it dropped CAP_SYS_NICE?... The process still has CAP_SYS_ADMIN, but I assume it should not allow it using something that requires dropped CAP_SYS_NICE.