On Mon, May 1, 2023 at 11:21 AM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote: > > On Mon, 1 May 2023 11:15:45 -0400 > Brian Hutchinson <b.hutchman@xxxxxxxxx> wrote: > > > Hey Steve, good to hear from you. > > > > On Mon, May 1, 2023 at 10:17 AM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote: > > > > > > On Mon, 1 May 2023 00:42:03 -0400 > > > Brian Hutchinson <b.hutchman@xxxxxxxxx> wrote: > > > > > > > Hi, > > > > > > > > Using 5.10.69 kernel on a i.MX8 platform with isolated cpu's (isolcpus > > > > on cmd line) with RT_RUNTIME_SHARE and wanting to know if (and who) is > > > > hitting the RT Throttle limit. Since my kernel does not have a > > > > tracepoint for this, I used gdb and disassembled kernel/sched/rt.c to > > > > find the address where throttling gets set and then added a kprobe for > > > > it. > > > > > > Where exactly did you add this kprobe? > > > > It was: > > /sys/kernel/debug/tracing > > > > echo 'p 0xffff8000100a8e38 %x0' > kprobe_events > > I meant, where was the exact location that address pointed to in the code, > not where in the user interface did you add it ;-) Douh! Ok, sorry, my bad. First attempt I put it here: 915 if (rt_rq->rt_time || rt_rq->rt_nr_running) 916 idle = 0; 917 raw_spin_unlock(&rt_rq->rt_runtime_lock); 918 } else if (rt_rq->rt_nr_running) { 919 idle = 0; 920 if (!rt_rq_throttled(rt_rq)) 921 enqueue = 1; 922 } 923 if (rt_rq->rt_throttled) 924 throttled = 1; 925 926 if (enqueue) 927 sched_rt_rq_enqueue(rt_rq); 928 raw_spin_unlock(&rq->lock); 929 } on line 924. I'm not the most versed at A53 assembly code so I guessed which instruction to use (it's always tricky with pipelining etc.) then cheated and put a breakpoint on line 924 to confirm and used that address. Then later I did the same thing: 964 if (rt_rq->rt_time > runtime) { 965 struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq); 966 967 /* 968 * Don't actually throttle groups that have no runtime assigned 969 * but accrue some time due to boosting. 970 */ 971 if (likely(rt_b->rt_runtime)) { 972 rt_rq->rt_throttled = 1; 973 printk_deferred_once("sched: RT throttling activated\n"); 974 } else { 975 /* 976 * In case we did anyway, make it go away, 977 * replenishment is a joke, since it will replenish us 978 * with exactly 0 ns. 979 */ 980 rt_rq->rt_time = 0; 981 } 982 983 if (rt_rq_throttled(rt_rq)) { 984 sched_rt_rq_dequeue(rt_rq); 985 return 1; 986 } 987 } ... and put a breakpoint on line 972 and used that address for kprobe. When I did this I pretty much got the same results as before but I got a lot of trace records more quickly (most were "idle") and a kworker thread showed up in the mix too. I'm also not a scheduler expert so I don't quite know the differences between the two places rt_throttled gets set (although it should be easy from the copious amount of comments ... not!). I just suspect the throttle is kicking in which is choking off sched_other OS stuff and I'm trying to figure out who's doing it and why. > > > > > Your "Fun with Dynamic Kernel Tracing Events" talk in 2018 was cool > > with all those examples but you never gave examples on how to do this > > kind of stuff wink, wink, ha, ha. > > > > > > > > > > > > > When I look at the trace I'm seeing what looks like "idle" being > > > > throttled in addition to other things. I "think" my probe is working > > > > as when I look at the trace, the processes that show up look like they > > > > have run over the 95% limit of one period (defaulted to 1 second) but > > > > I'm confused as to why Idle shows up. > > > > > > Are you saying this because it's happening while idle is running? > > > > I don't know what I'm saying. Part of me posting was me questioning > > if what I'm seeing is even valid ... but it makes more sense now that > > you point out that it's a hard interrupt happening while idle. Now I > > just have to figure out which interrupt. > > > > > > > > > > > > > I've not ran lttng trace capturing context switches yet to see what > > > > other processes might be involved around the time the throttle kicks > > > > in, at the moment I'm just trying to validate I'm going after the > > > > problem and setting it up the right way. > > > > > > > > I did identify a patch that enables a tracepoint for the rt throttle > > > > but it is for RT_GROUP_SCHED and we aren't using that at the moment. > > > > > > > > I suspect the application I'm trying to debug has some misbehaving > > > > realtime processes (not using a rt patched kernel) that are being > > > > throttled so I'm trying to identify them so they can be studied and > > > > made to behave better. > > > > > > > > If anyone has a better idea or advice on how to go about this please > > > > point me in the right direction. > > > > > > > > Below is a sample of a trace I captured using the kprobe I mentioned > > > > above when rt_throttled=1. I don't quite understand how Idle can be 3 > > > > levels deep in preemption or why "idle" is even showing up. > > > > > > > > CPU's 1, 2 and 3 have been isolated and the application that appears > > > > to be getting throttled is on core 2: > > > > > > > > # tracer: nop > > > > # > > > > # entries-in-buffer/entries-written: 592/592 #P:4 > > > > # > > > > # _-----=> irqs-off > > > > # / _----=> need-resched > > > > # | / _---=> hardirq/softirq > > > > # || / _--=> preempt-depth > > > > # ||| / delay > > > > # TASK-PID CPU# |||| TIMESTAMP FUNCTION > > > > # | | | |||| | | > > > > <idle>-0 [002] d.h3 508.964800: p_0x00000000223c0e95: > > > > (sched_rt_period_timer+0x1f0/0x328) arg1=0x0 > > > > > > > > > Note the "d.h3", which means as the heading states, 'd' is interrupts are > > > disabled, 'h' means it is running in hard interrupt context. That means, > > > even though idle is the current task, an interrupt triggered, and this is > > > running in interrupt context. > > > > As I said above, makes more sense why idle is showing up now that you > > pointed that out. This is an app that was ported from a different OS > > that used a single core arch and now that's its on multicore arch > > (quad core A53's) with SMP Linux, I think there are some old school > > disable interrupts/preempt critical section areas that are going to > > need to be reworked ... which is one of the reasons I believe it was > > necessary to pin the apps to an isolated core on the i.MX8 to get them > > to even run. > > > > > > > > I see it's running sched_rt_period_timer() which calls > > > do_sched_rt_period_timer(), and if you look at that function, it does > > > for_each_cpu(i, span), checking other CPUs to see if it should be throttled > > > or not. > > > > ... about that. I screwed up and meant to say scheduler features is > > set to NO_RT_RUNTIME_SHARE in the original post as I've read there is > > some weirdness on "what" sched_rt_runtime_us means ... a limit on the > > system as a whole (all cpu's) or each individual cpu. I "believe" the > > way I'm running it is each individual cpu gets sched_rt_runtime_us for > > realtime stuff. > > Correct. I've ran some tests on rt spinners where if their affinity is > broad they bounce around CPUs and still maintain 100% CPU resource (but on > different CPUs). But if you pin it to a single CPU, it will get throttled. I don't "think" anything is moving around here unless the OS is going against what it's been told to do. The cpu's are isolated with cmdline at boot and the affinity of these processes has been selected when they startup so they should stay where they are put unless there is something going on I don't know about. I think I've seen talk before that the affinity is just a "suggestion" making me think there are cases where something could move but if that's true I don't know what those scenarios are. Regards, Brian