On Fri, Jan 27, 2023 at 01:09:02PM +0100, Petr Mladek wrote: > There might actually be two possibilities why the transition fails > too often: > > 1. The task might be in the running state most of the time. Therefore > the backtrace is not reliable most of the time. > > In this case, some cooperation with the scheduler would really > help. We would need to stop the task and check the stack > when it is stopped. Something like the patch you proposed. This is the situation we are encountering. > 2. The task might be sleeping but almost always in a livepatched > function. Therefore it could not be transitioned. > > It might be the case with vhost_worker(). The main loop is "tiny". > The kthread probaly spends most of the time with processing > a vhost_work. And if the "works" are livepatched... > > In this case, it would help to call klp_try_switch_task(current) > in the main loop in vhost_worker(). It would always succeed > when vhost_worker() is not livepatched on its own. > > Note that even this would not help with kPatch when a single > vhost_work might need more than the 1 minute timout to get proceed. > > > diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c > > index f1b25ec581e0..06746095a724 100644 > > --- a/kernel/livepatch/transition.c > > +++ b/kernel/livepatch/transition.c > > @@ -9,6 +9,7 @@ > > > > #include <linux/cpu.h> > > #include <linux/stacktrace.h> > > +#include <linux/stop_machine.h> > > #include "core.h" > > #include "patch.h" > > #include "transition.h" > > @@ -334,6 +335,16 @@ static bool klp_try_switch_task(struct task_struct *task) > > return !ret; > > } > > > > +static int __stop_try_switch(void *arg) > > +{ > > + return klp_try_switch_task(arg) ? 0 : -EBUSY; > > +} > > + > > +static bool klp_try_switch_task_harder(struct task_struct *task) > > +{ > > + return !stop_one_cpu(task_cpu(task), __stop_try_switch, task); > > +} > > + > > /* > > * Sends a fake signal to all non-kthread tasks with TIF_PATCH_PENDING set. > > * Kthreads with TIF_PATCH_PENDING set are woken up. > > Nice. I am surprised that it can be implemented so easily. Yes, that's a neat solution. I will give it a try. AIUI this still doesn't help for architectures without a reliable stacktrace though, right? So we probably should only try this for architectures which do have relaible stacktraces. Thanks, Seth