On Fri, Jan 27, 2023 at 11:37:02AM +0100, Peter Zijlstra wrote: > On Thu, Jan 26, 2023 at 08:43:55PM -0800, Josh Poimboeuf wrote: > > On Thu, Jan 26, 2023 at 03:12:35PM -0600, Seth Forshee (DigitalOcean) wrote: > > > On Thu, Jan 26, 2023 at 06:03:16PM +0100, Petr Mladek wrote: > > > > On Fri 2023-01-20 16:12:20, Seth Forshee (DigitalOcean) wrote: > > > > > We've fairly regularaly seen liveptches which cannot transition within kpatch's > > > > > timeout period due to busy vhost worker kthreads. > > > > > > > > I have missed this detail. Miroslav told me that we have solved > > > > something similar some time ago, see > > > > https://lore.kernel.org/all/20220507174628.2086373-1-song@xxxxxxxxxx/ > > > > > > Interesting thread. I had thought about something along the lines of the > > > original patch, but there are some ideas in there that I hadn't > > > considered. > > > > Here's another idea, have we considered this? Have livepatch set > > TIF_NEED_RESCHED on all kthreads to force them into schedule(), and then > > have the scheduler call klp_try_switch_task() if TIF_PATCH_PENDING is > > set. > > > > Not sure how scheduler folks would feel about that ;-) > > So, let me try and page all that back in.... :-) > > KLP needs to unwind the stack to see if any of the patched functions are > active, if not, flip task to new set. > > Unwinding the stack of a task can be done when: > > - task is inactive (stable reg and stack) -- provided it stays inactive > while unwinding etc.. > > - task is current (guarantees stack doesn't dip below where we started > due to being busy on top etc..) > > Can NOT be done from interrupt context, because can hit in the middle of > setting up stack frames etc.. > > The issue at hand is that some tasks run for a long time without passing > through an explicit check. > > The thread above tried sticking something in cond_resched() which is a > problem for PREEMPT=y since cond_resched() is a no-op. > > Preempt notifiers were raised, and those would actually be nice, except > you can only install a notifier on current and you need some memory > allocated per task, which makes it less than ideal. Plus ... > > ... putting something in finish_task_switch() wouldn't be the end of the > world I suppose, but then you still need to force schedule the task -- > imagine it being the only runnable task on the CPU, there's nothing > going to make it actually switch. > > Which then leads me to suggest something daft like this.. does that > help? The changes below are working well for me. The context has a read lock on tasklist_lock so it can't sleep, so I'm using stop_one_cpu_nowait() with per-CPU state. Based on Josh's comments it sounds like the klp_have_reliable_stack() check probably isn't quite right, and we might want to add something else for PREEMPT+!ORC. But I wanted to go ahead and see if this approach seems reasonable to everyone. Thanks, Seth diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c index f1b25ec581e0..9f3898f02828 100644 --- a/kernel/livepatch/transition.c +++ b/kernel/livepatch/transition.c @@ -9,6 +9,7 @@ #include <linux/cpu.h> #include <linux/stacktrace.h> +#include <linux/stop_machine.h> #include "core.h" #include "patch.h" #include "transition.h" @@ -334,9 +335,16 @@ static bool klp_try_switch_task(struct task_struct *task) return !ret; } +static int __try_switch_kthread(void *arg) +{ + return klp_try_switch_task(arg) ? 0 : -EBUSY; +} + +static DEFINE_PER_CPU(struct cpu_stop_work, klp_stop_work); + /* * Sends a fake signal to all non-kthread tasks with TIF_PATCH_PENDING set. - * Kthreads with TIF_PATCH_PENDING set are woken up. + * Kthreads with TIF_PATCH_PENDING set are preempted or woken up. */ static void klp_send_signals(void) { @@ -357,11 +365,22 @@ static void klp_send_signals(void) * would be meaningless. It is not serious though. */ if (task->flags & PF_KTHREAD) { - /* - * Wake up a kthread which sleeps interruptedly and - * still has not been migrated. - */ - wake_up_state(task, TASK_INTERRUPTIBLE); + if (task_curr(task) && klp_have_reliable_stack()) { + /* + * kthread is currently running on a CPU; try + * to preempt it. + */ + stop_one_cpu_nowait(task_cpu(task), + __try_switch_kthread, + task, + this_cpu_ptr(&klp_stop_work)); + } else { + /* + * Wake up a kthread which sleeps interruptedly + * and still has not been migrated. + */ + wake_up_state(task, TASK_INTERRUPTIBLE); + } } else { /* * Send fake signal to all non-kthread tasks which are