Josh Poimboeuf <jpoimboe@xxxxxxxxxx> writes: > On Thu, Nov 09, 2023 at 12:31:47PM -0500, Steven Rostedt wrote: >> On Thu, 9 Nov 2023 09:26:37 -0800 >> Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote: >> >> > On Tue, Nov 07, 2023 at 06:16:09PM -0500, Steven Rostedt wrote: >> > > On Tue, 7 Nov 2023 13:56:53 -0800 >> > > Ankur Arora <ankur.a.arora@xxxxxxxxxx> wrote: >> > > >> > > > This reverts commit e3ff7c609f39671d1aaff4fb4a8594e14f3e03f8. >> > > > >> > > > Note that removing this commit reintroduces "live patches failing to >> > > > complete within a reasonable amount of time due to CPU-bound kthreads." >> > > > >> > > > Unfortunately this fix depends quite critically on PREEMPT_DYNAMIC and >> > > > existence of cond_resched() so this will need an alternate fix. >> > >> > We definitely don't want to introduce a regression, something will need >> > to be figured out before removing cond_resched(). >> > >> > We could hook into preempt_schedule_irq(), but that wouldn't work for >> > non-ORC. >> > >> > Another option would be to hook into schedule(). Then livepatch could >> > set TIF_NEED_RESCHED on remaining unpatched tasks. But again if they go >> > through the preemption path then we have the same problem for non-ORC. >> > >> > Worst case we'll need to sprinkle cond_livepatch() everywhere :-/ >> > >> >> I guess I'm not fully understanding what the cond rescheds are for. But >> would an IPI to all CPUs setting NEED_RESCHED, fix it? Yeah. We could just temporarily toggle to full preemption, when NEED_RESCHED_LAZY is always upgraded to NEED_RESCHED which will then send IPIs. > If all livepatch arches had the ORC unwinder, yes. > > The problem is that frame pointer (and similar) unwinders can't reliably > unwind past an interrupt frame. Ah, I wonder if we could just disable the preempt_schedule_irq() path temporarily? Hooking into schedule() alongside something like this: @@ -379,7 +379,7 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs) void irqentry_exit_cond_resched(void) { - if (!preempt_count()) { + if (klp_cond_resched_disable() && !preempt_count()) { The problem would be tasks that don't go through any preemptible sections. -- ankur