On Mon 2022-05-09 16:22:11, Song Liu wrote: > > > > On May 9, 2022, at 8:07 AM, Petr Mladek <pmladek@xxxxxxxx> wrote: > > > > On Sat 2022-05-07 10:46:28, Song Liu wrote: > >> Busy kernel threads may block the transition of livepatch. Call > >> klp_try_switch_task from __cond_resched to make the transition easier. > > > > Do you have some numbers how this speeds up the transition > > and how it slows down the scheduler, please? > > We don’t have number on how much this would slow down the scheduler. > For the transition, we see cases where the transition cannot finish > with in 60 seconds (how much "kpatch load" waits by default). 60s might be too low limit, see below. > > cond_resched() is typically called in cycles with many interactions > > where the task might spend a lot of time. There are two possibilities. > > cond_resched() is called in: > > > > + livepatched function > > > > In this case, klp_try_switch_task(current) will always fail. > > And it will non-necessarily slow down every iteration by > > checking the very same stack. > > > > > > + non-livepatched function > > > > In this case, the transition will succeed on the first attempt. > > OK, but it would succeed also without that patch. The task would > > most likely sleep in this cond_resched() so that it might > > be successfully transitioned on the next occasion. > > We are in the non-live patched case. But the transition didn’t happen > in time, because the kernel thread doesn’t go to sleep. While there is > clearly something weird with this thread, we think live patch should > work because the thread does call cond_resched from time to time. I guess that it goes to sleep. Otherwise it would trigger soft lockup report if you have the watchdog enabled. IMHO, the problem is that klp_transition_work_fn() tries the transition "only" once per second, see void klp_try_complete_transition(void) { [...] schedule_delayed_work(&klp_transition_work, round_jiffies_relative(HZ)); [...] } It means that there are "only" 60 attempts to migrate the busy process. It fails when the process is in the running state or sleeping in a livepatched function. There is a _non-zero_ chance of a bad luck. It would be great to measure how long it will take to complete the transition if you remove the limit 60s. Anyway, the limit 60s looks like a bad idea to me. It is too low. For example, we do not use any limit at all in SUSE products. And the only report was that some thread from a 3rd party module could not be migrated. It was stuck with a livepatched function on the stack. The kthread had really problematic design. I am afraid that even this patch would not help in that case. Now, back to the limit. There are basically two problems when the transition takes too long: + It blocks another transition. But the frequency of new livepatches it typically counted in days or even weeks. + It means that a process is running the buggy/vulnerable code longer time. But few hours should be acceptable given how long it takes to prepare the livepatch. Also it looks better to have 99.9% of processes running the fixed code that revert the fix just because a single process needs longer time to get transitioned. I could imagine that there really is a process that is almost impossible to livepatch. It might get worse on NO_HZ system. The question is it happens in the real life. I would personally start with prolonging or removing the limit. Best Regards, Petr