Hello Daniel, Steve, On Thu, 23 Feb 2023 at 20:24, Daniel Bristot de Oliveira <bristot@xxxxxxxxxx> wrote: > On 2/23/23 11:39, Steven Rostedt wrote: >>> kworker/[120] <--- this 120 is likely not the same as >>> ktimer/[97] <---- this 97 >>> >>> The kworker is likely a SCHED_OTHER 0 nice, and ktimer a FIFO:97. >>> You are placing your load in between them. * Oh right, even those threads have different priorities. >>> That would not be bad if we ran a traditional periodic/sporadic real-time >>> workload. That is, task that waits for an event, wakes up, runs, and goes >>> to sleep waiting for the next event. >>> >>> The problem is that oslat/osnoise run non-stop. >>> >>> Then a kworker awakened on the CPU will... starve. You will not see it >>> causing a sched_switch, but if the kworker is pinned to that CPU, it wil >>> not make progress. >> >> Note, the kworker and other kernel threads that are pinned to a CPU are >> ones that service requests that were triggered on that CPU. It is possible >> to run a task at FIFO 99 on an isolated CPU non stop without causing any >> issue (you may also need to enable NO_HZ_FULL and make sure RCU has >> no-callbacks enabled where the RCU for that isolated CPU gets its work done >> on other CPUs). > > Yes, but in the perfect isolation case, where no other task is scheduled there, being > FIFO and OTHER or even IDLE is... equivalent as no scheduler is needed :-). > >> If your FIFO task calls into the kernel and does something that triggers a >> worker, then you may then have an issue. You will need to make sure that >> worker gets time to run. >> >> The point I'm making is that it is possible to get something working where >> you have a FIFO task running 100%, but you need to set up the system where >> it will not cause issues. That requires knowing what system calls that are >> done on that CPU that may require workers. >> >> Oh, and there's another issue that can cause problems. Even if you figured >> out everything your task does, and make sure that it doesn't trigger any >> pinned kworkers, and you are using NO_CB_RCU and NO_HZ_FULL, there's still >> an issue that needs to be taken care of. That is, if there was some task >> running on that CPU just before your FIFO task runs, it could have >> triggered a kworker. And even though it may be done, or even migrated to >> another CPU, that kworker will still need to execute. I've seen this cause >> days of debugging to why the system crashed. > > There are also cases where kworkers are dispatched to all CPUs, from a non-isolated CPU, > to do some house-keeping work. E.g., I think that ftrace used to do that to allocate buffers. > Ideally, all these cases should be reworked to avoid dispatching kworkers where they are > not needed. But as kworkers are added to the code as part of the development, and bad > 3rd part drivers can also do it... and... who knows? > > in the exceptional case of something happening to that CPU, they are likely sort living > kernel work that is is just easier to let them run, one monitors those cases and try > to fix the code to avoid them. > > That is why the safest path is to: assuming that the isolcpus is done at the perfection, > no schedule will happen, and so all the schedulers are equivalent. > * I see, got it. Thank you so much for your kind replies and detailed explanations, I appreciate it. Thank you. --- - P J P