On 05/07/23 18:48, Nadav Amit wrote: >> On Jul 5, 2023, at 11:12 AM, Valentin Schneider <vschneid@xxxxxxxxxx> wrote: >> >> Deferral approach >> ================= >> >> Storing each and every callback, like a secondary call_single_queue turned out >> to be a no-go: the whole point of deferral is to keep NOHZ_FULL CPUs in >> userspace for as long as possible - no signal of any form would be sent when >> deferring an IPI. This means that any form of queuing for deferred callbacks >> would end up as a convoluted memory leak. >> >> Deferred IPIs must thus be coalesced, which this series achieves by assigning >> IPIs a "type" and having a mapping of IPI type to callback, leveraged upon >> kernel entry. > > I have some experience with similar an optimization. Overall, it can make > sense and as you show, it can reduce the number of interrupts. > > The main problem of such an approach might be in cases where a process > frequently enters and exits the kernel between deferred-IPIs, or even worse - > the IPI is sent while the remote CPU is inside the kernel. In such cases, you > pay the extra cost of synchronization and cache traffic, and might not even > get the benefit of reducing the number of IPIs. > > In a sense, it's a more extreme case of the overhead that x86’s lazy-TLB > mechanism introduces while tracking whether a process is running or not. But > lazy-TLB would change is_lazy much less frequently than context tracking, > which means that the deferring the IPIs as done in this patch-set has a > greater potential to hurt performance than lazy-TLB. > > tl;dr - it would be beneficial to show some performance number for both a > “good” case where a process spends most of the time in userspace, and “bad” > one where a process enters and exits the kernel very frequently. Reducing > the number of IPIs is good but I don’t think it is a goal by its own. > There already is a significant overhead incurred on kernel entry for nohz_full CPUs due to all of context_tracking faff; now I *am* making it worse with that extra atomic, but I get the feeling it's not going to stay :D nohz_full CPUs that do context transitions very frequently are unfortunately in the realm of "you shouldn't do that". Due to what's out there I have to care about *occasional* transitions, but some folks consider even that to be broken usage, so I don't believe getting numbers for that to be much relevant. > [ BTW: I did not go over the patches in detail. Obviously, there are > various delicate points that need to be checked, as avoiding the > deferring of IPIs if page-tables are freed. ]