On Tue, Oct 01, 2024 at 02:52:37PM +0200, Valentin Schneider wrote: > On 01/10/24 03:10, Paul E. McKenney wrote: > > On Mon, Sep 30, 2024 at 10:44:24PM +0200, Valentin Schneider wrote: > >> On 30/09/24 12:09, Paul E. McKenney wrote: > >> > > >> > And Peter asked that I send along a reproducer, which I am finally getting > >> > around to doing: > >> > > >> > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 12h --configs "100*TREE03" --trust-make > >> > > >> > >> FYI Tomas (on Cc) has been working on getting pretty much this to run on > >> our infra, no hit so far. > >> > >> How much of a pain would it be to record an ftrace trace while this runs? > >> I'm thinking sched_switch, sched_wakeup and function-tracing > >> dl_server_start() and dl_server_stop() would be a start. > >> > >> AIUI this is running under QEMU so we'd need to record the trace within > >> that, I'm guessing we can (ab)use --bootargs to feed it tracing arguments, > >> but how do we get the trace out? To answer this question directly, I am trying this: --bootargs "trace_event=sched:sched_switch,sched:sched_wakeup ftrace_filter=dl_server_start,dl_server_stop torture.ftrace_dump_at_shutdown=1" Huh, 50MB and growing. I need to limit the buffer size as well. How about "trace_buf_size=2k"? The default is 1,441,792, just over 1m. Except that I am not getting either dl_server_start() or dl_server_stop(), perhaps because they are not being invoked in this short test run. So try some function that is definitely getting invoked, such as rcu_sched_clock_irq(). No joy there, either, so maybe add "ftrace=function"? No: "[ 1.542360] ftrace bootup tracer 'function' not registered." The "torture.ftrace_dump_at_shutdown" is just for experiment. The actual runs will do something like this: if (on_dl_rq(dl_se)) { // Was: WARN_ON_ONCE(on_dl_rq(dl_se)); tracing_off(); ftrace_dump(DUMP_ALL); WARN_ON_ONCE(1); } > > Me, I would change those warnings to dump the trace buffer to the > > console when triggered. Let me see if I can come up with something > > better over breakfast. And yes, there is the concern that adding tracing > > will suppress this issue. > > > > So is there some state that I could manually dump upon triggering either > > of these two warnings? That approach would minimize the probability of > > suppressing the problem. > > Usually enabling panic_on_warn and getting a kdump is ideal, but here this > is with QEMU - I know we can get a vmcore out via dump-guest-memory in the > QEMU monitor, but I don't have an immediate solution to do that on a > warn/panic. Especially given that I don't have a QEMU monitor for these 100 runs. But if there is a way to do this programatically from within the kernel, I would be happy to give it a try. > Also I'd say here we're mostly interested in the sequence of events leading > us to the warn (dl_server_start() when the DL entity is somehow still > enqueued) rather than the state of things when the warn is hit, and for > that dumping the ftrace buffer to the console sounds good enough to me. That I can do!!! Give or take function tracing appearing not to work for me from the kernel command line. :-( Thanx, Paul