Re: Minimalistic consistency model (one step above null model)

Vojtech Pavlik <vojtech@xxxxxxx> · Fri, 14 Nov 2014 21:58:19 +0100

On Fri, Nov 14, 2014 at 01:58:06PM -0600, Josh Poimboeuf wrote:

> > The same applies to the current livepatch implementation, though.  You
> > cannot tell if all threads have left the functions you've patched or
> > whether they're remaining on the stack and as such cannot remove patches
> > safely nor start transforming data.
> 
> Yes, the current livepatch code can't remove patches.  But it can at
> least disable them.  If you follow the rule that function prototype and
> data changes aren't allowed, then you can do a crude cumulative patch
> upgrade by disabling the old patch module and loading a new one.  You
> can also do incremental patching, if desired.  This is a workable first
> solution, I think.  It's a placeholder until we can get a real
> consistency model agreed upon and implemented.

Yes, it is a workable first solution by all means. 

> As a temporary solution, I don't see it being much of a problem that you
> can't change function prototypes.  In most cases I think you can work
> around that with a little bit of creativity.  In fact, this limitation
> is really a benefit as it keeps the code very simple.
> 
> Once you add the above minimal approach, by definition you have to start
> allowing function prototype changes.  And then, for each patch you need
> to start tracking its patched function set and its patching progress,
> which complicates things considerably when you have to deal with
> multiple patches. 

You don't need tracking patching progress, if you don't mind leaving
the trampolines in forever and having the trampoline code deal with
multiple patched sets (a slightly more complicated interval tree that
can return an integer rather than a boolean).

> So I think this starts to look like a not-so-minimal consistency
> model, with the only benefit being that you can change function
> prototypes.

With the minimal approach, when you don't track patching progress, you
also can never tell when a time comes when no old code will be executed
anymore. Any sleeper can still wake up into any old function on its
return stack.

The added prototype-modification consistency doesn't really change
anything in that regard, it just makes the kernel not crash when it
otherwise would.

> > The above implementation of extending the current livepatch to handle
> > function prototype changes can be further extended to handle multiple
> > patches 'in progress' by only jumping to a 'next level' patch if coming
> > from a function not changed by that level.
> > 
> > 
> > > BTW, here's a very simple way to implement LEAVE_PATCHED_SET +
> > > SWITCH_THREAD:
> > > 
> > > for each thread {
> > >     if (!on_cpu(thread) and !on_stack(thread, funcs) {
> > >         switch thread to new universe
> > >     } else if (unfreezable kthread) {
> > >         retry and/or fail (kthread is either CPU-bound or is using patched func)
> > >     } else {
> > >         freeze thread
> > > 	if (!on_stack(thread, funcs))
> > > 	    switch thread to new universe
> > > 	else
> > > 	    fail (patch affects freezer/schedule path)
> > >     }
> > > }
> > 
> > I think this could work. You need to make sure you can prevent a process
> > from waking up, however, to avoid race conditions in the first if().
> 
> Yeah, this is the tricky part.  I'm not sure how to do it properly
> without hacking the scheduler.  I believe this race also exists with
> Masami's kpatch implementation.

I think so, and I don't have a good plan either.

> > A very similar method could be used to detect if the minimalistic
> > approach above finished patching, trampolines can be removed, data
> > patching started, etc.
> 
> True, but then what's the advantage of doing all that instead of a
> proper LEAVE_PATCHED_SET SWITCH_THREAD approach?

Not much.

An even faster transition to new code. A very simple and stateless
transition to new code, since all the complexity is moved to the code
detecting the transition is over.

But at the cost of a fairly heavy trampoline.

> > > But the code is simple and converges to the patched state quickly.
> > > 
> > > Thoughts?
> > 
> > You could avoid the freezing (and thus userspace impact) by tracking
> > syscall (or freezer) entry/exit like kGraft does.
> 
> Yes, in the rare cases where we'd have to freeze a running thread, there
> is a user space performance impact.  It might be useful to measure it.
> I'd imagine it's much better than stop_machine.

It's just handling a fake signal, and then going back immediately. The
impact is very small.

But it'd be much nicer if we didn't have to interrupt execution at all.

> > Any process cpu-bound in userspace can be migrated immediately
> 
> Good point.  I wonder if there's a race-free way to do that without
> modifying the syscall code.

Like with the sleeping tasks, I don't have one yet. We only need the
syscall entry path, though. And on most arch that's zero (true zero)
performance impact.

> > and any proces executing syscalls can be migrated when on the
> > kernel/userspace boundary. And processes that are sleeping are already
> > handled.
> 
> Yes, but now you're getting into LEAVE_KERNEL territory.  And we'd have
> to modify syscall 

Yes.

> and kthread code again...

No.

Unlike userspace processes, there are no kthreads that are cpu-bound for
extended time periods. If there would be, that's a bug. They all go to
sleep eventually.

But you may want to add some code to the freezer() call to make it
simpler..

> The benefits of the freezer approach are that it kills two birds with
> one stone in a very simple way, and doesn't necessarily require an "in
> progress" patching state like options b and c.

Yes.

> The drawbacks are the rare performance impact and the fact that it will
> fail sometimes.  Most of the failures would be predictable ahead of
> time, since I think(?) the freezer code path is consistent.
> 
> But it would be hard to predict which functions would be used by a
> sleeping kthread.  I suppose we could avoid the unpredictable failures
> by setting an "in progress" state instead of returning an error, but
> that eliminates one of its advantages.  Another approach to this problem
> would be to continue with the work of converting kthreads to workqueues
> and/or making kthreads freezable.

Yes.

> But... maybe let's forget all that.  Here's yet another idea which is
> currently my favorite :-)
> 
> 1. Install an ftrace handler at the beginning of every to-be-patched
>    function which converts the current thread to the new universe if
>    there are no other patched functions on its stack.
> 
> 2. Run the following on each cpu (smp_call_function):
> 	for each thread
> 		if task_cpu(task) == this cpu
> 			check backtrace and convert thread to new universe if possible
> 
>    [ This introduces some scheduling latency, but it's much better than
>    stop_machine. ]

I don't see how you can guarantee that this sleeping task will not
be woken up on another CPU while you're checking, given that
smp_call_function isn't synchronous on all CPUs.

> 3. Send fake signals to any remaining old universe tasks, and wait for
>    the ftrace handler to convert them.

A fake signal (like in freezer) only works on threads in userspace to
bring them into the kernel. It won't wake them up from sleeping loops
waiting for some event.

Tasks that get a fake signal in userspace are not going a function in
the patched set and thus are not going to call the ftrace handler.

I think you need a real signal here that'll exit the syscall entirely.

>    Then repeat step 2 to catch those stragglers for which a signal
>    didn't cause it to run the same code path again.  Now the patching
>    is complete (no "in progress" state needed).

This could work, but real signals can be an issue.

> 3. (Alternative) Have an "in progress" state.  Periodically check if all
>    tasks have been converted and repeat step 2 if not.  User can
>    optionally send signals to tasks to get them to exit out of the old
>    functions.

Sending real signals with possible userspace visibility is the only way
to proceed when patching a long-sleeping function (or in this model also
any function on the stack below).

-- 
Vojtech Pavlik
Director SuSE Labs
--
To unsubscribe from this list: send the line "unsubscribe live-patching" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html