> On May 10, 2022, at 11:42 AM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote: > > On Tue, May 10, 2022 at 06:07:00PM +0000, Rik van Riel wrote: >> On Tue, 2022-05-10 at 09:52 -0700, Josh Poimboeuf wrote: >>> On Tue, May 10, 2022 at 04:07:42PM +0000, Rik van Riel wrote: >>>>> >>>> Now I wonder if we could just hook up a preempt notifier >>>> for kernel live patches. All the distro kernels already >>>> need the preempt notifier for KVM, anyway... >>>> >>> >>> I wouldn't be opposed to that, but how does it solve this problem? >>> If >>> as Peter said cond_resched() can be a NOP, then preemption would have >>> to >>> be from an interrupt, in which case frame pointers aren't reliable. >>> >> The systems where we are seeing problems do not, as far >> as I know, throw softlockup errors, so the kworker >> threads that fail to transition to the new KLP version >> are sleeping and getting scheduled out at times. > > Are they sleeping due to an explicit call to cond_resched()? In this case, yes. The thread calls cond_resched(). > >> A KLP transition preempt notifier would help those >> kernel threads transition to the new KLP version at >> any time they reschedule. > > ... unless cond_resched() is a no-op due to CONFIG_PREEMPT? Based on my understanding (and a few other folks we chatted with), a kernel thread can legally run for extended time, as long as it calls cond_resched() at a reasonable frequency. Therefore, I think we should be able to patch such thread easily, unless it calls cond_resched() with being-patched function in the stack, of course. OTOH, Petr's mindset of allowing many minutes for the patch transition is new to me. I need to think more about it. Josh, what’s you opinion on this? IIUC, kpatch is designed to only wait up to 60 seconds (no option to overwrite the time). > >> How much it will help is hard to predict, but I should >> be able to get results from a fairly large sample size >> of systems within a few weeks :) > > As Peter said, keep in mind that we will need to fix other cases beyond > Facebook, i.e., CONFIG_PREEMPT combined with non-x86 arches which don't > have ORC so they can't reliably unwind from an IRQ. I think livepatch transition may fail in different cases, and we don't need to address all of them in one shoot. Fixing some cases is an improvement as long as we don't slow down other cases. I understand that adding tiny overhead to __cond_resched() may end up as a visible regression. But maybe adding it to preempt_schedule_common() is light enough? Did I miss/misunderstand something? Thanks, Song