Re: Minimalistic consistency model (one step above null model)

Josh Poimboeuf <jpoimboe@xxxxxxxxxx> · Fri, 14 Nov 2014 14:14:37 -0600

On Fri, Nov 14, 2014 at 01:58:06PM -0600, Josh Poimboeuf wrote:
> On Fri, Nov 14, 2014 at 07:27:22AM +0100, Vojtech Pavlik wrote:
> > On Thu, Nov 13, 2014 at 11:22:10PM -0600, Josh Poimboeuf wrote:
> > 
> > > > The really trivial solution is:
> > > > 
> > > > 
> > > > 	static void lpc_ftrace_handler(unsigned long ip, unsigned long parent_ip,
> > > > 					struct ftrace_ops *ops, struct pt_regs *regs)
> > > > 	{
> > > > 		struct lpc_func *func = ops->private;
> > > > 
> > > > 		if (!within_patched_set(parent_ip))
> > > > 			regs->ip = func->new_addr;
> > > > 	}
> > > > 
> > > > 
> > > > Obviously, within_patched_set() would need an efficient data structure,
> > > > like an interval tree, to quickly decide whether the parent ip address
> > > > is within the patched set.
> > > > 
> > > > This is enough to make sure that function calls are consistent.
> > > > 
> > > > No stack checking, no stop_machine(), no forcing threads through
> > > > userspace.
> > > > 
> > > > A disadvantage is that it doesn't give an indication when patching is
> > > > over and the trampoline can be removed. For that, I currently don't see
> > > > a better method than tracking kernel exits and/or stack analysis.
> > > > 
> > > > What do you think?
> > > 
> > > Interesting idea.  But the hard part is figuring out when the patching
> > > is complete.  Once you implement that, it might not be so minimal.
> > 
> > True.
> > 
> > The same applies to the current livepatch implementation, though.  You
> > cannot tell if all threads have left the functions you've patched or
> > whether they're remaining on the stack and as such cannot remove patches
> > safely nor start transforming data.
> 
> Yes, the current livepatch code can't remove patches.  But it can at
> least disable them.  If you follow the rule that function prototype and
> data changes aren't allowed, then you can do a crude cumulative patch
> upgrade by disabling the old patch module and loading a new one.  You
> can also do incremental patching, if desired.  This is a workable first
> solution, I think.  It's a placeholder until we can get a real
> consistency model agreed upon and implemented.
> 
> As a temporary solution, I don't see it being much of a problem that you
> can't change function prototypes.  In most cases I think you can work
> around that with a little bit of creativity.  In fact, this limitation
> is really a benefit as it keeps the code very simple.
> 
> Once you add the above minimal approach, by definition you have to start
> allowing function prototype changes.  And then, for each patch you need
> to start tracking its patched function set and its patching progress,
> which complicates things considerably when you have to deal with
> multiple patches.  So I think this starts to look like a not-so-minimal
> consistency model, with the only benefit being that you can change
> function prototypes.
> 
> > The above implementation of extending the current livepatch to handle
> > function prototype changes can be further extended to handle multiple
> > patches 'in progress' by only jumping to a 'next level' patch if coming
> > from a function not changed by that level.
> > 
> > 
> > > BTW, here's a very simple way to implement LEAVE_PATCHED_SET +
> > > SWITCH_THREAD:
> > > 
> > > for each thread {
> > >     if (!on_cpu(thread) and !on_stack(thread, funcs) {
> > >         switch thread to new universe
> > >     } else if (unfreezable kthread) {
> > >         retry and/or fail (kthread is either CPU-bound or is using patched func)
> > >     } else {
> > >         freeze thread
> > > 	if (!on_stack(thread, funcs))
> > > 	    switch thread to new universe
> > > 	else
> > > 	    fail (patch affects freezer/schedule path)
> > >     }
> > > }
> > 
> > I think this could work. You need to make sure you can prevent a process
> > from waking up, however, to avoid race conditions in the first if().
> 
> Yeah, this is the tricky part.  I'm not sure how to do it properly
> without hacking the scheduler.  I believe this race also exists with
> Masami's kpatch implementation.
> 
> > A very similar method could be used to detect if the minimalistic
> > approach above finished patching, trampolines can be removed, data
> > patching started, etc.
> 
> True, but then what's the advantage of doing all that instead of a
> proper LEAVE_PATCHED_SET SWITCH_THREAD approach?
> 
> > > Note this approach will fail in a few cases, like if you have a
> > > CPU-bound kthread (but we can keep retrying in that case), if a patched
> > > function is in use by a sleeping kthread, or if the patch affects the
> > > freeze/schedule path.
> > 
> > None of the so far proposed methods can complete patching if a patched
> > function is in use by a sleeping kthread, or even any thread. So the
> > limitations you show aren't a huge problem.
> >
> > > But the code is simple and converges to the patched state quickly.
> > > 
> > > Thoughts?
> > 
> > You could avoid the freezing (and thus userspace impact) by tracking
> > syscall (or freezer) entry/exit like kGraft does.
> 
> Yes, in the rare cases where we'd have to freeze a running thread, there
> is a user space performance impact.  It might be useful to measure it.
> I'd imagine it's much better than stop_machine.
> 
> > Any process cpu-bound in userspace can be migrated immediately
> 
> Good point.  I wonder if there's a race-free way to do that without
> modifying the syscall code.
> 
> > and any proces executing syscalls can be migrated when on the
> > kernel/userspace boundary. And processes that are sleeping are already
> > handled.
> 
> Yes, but now you're getting into LEAVE_KERNEL territory.  And we'd have
> to modify syscall and kthread code again...
> 
> The benefits of the freezer approach are that it kills two birds with
> one stone in a very simple way, and doesn't necessarily require an "in
> progress" patching state like options b and c.
> 
> The drawbacks are the rare performance impact and the fact that it will
> fail sometimes.  Most of the failures would be predictable ahead of
> time, since I think(?) the freezer code path is consistent.
> 
> But it would be hard to predict which functions would be used by a
> sleeping kthread.  I suppose we could avoid the unpredictable failures
> by setting an "in progress" state instead of returning an error, but
> that eliminates one of its advantages.  Another approach to this problem
> would be to continue with the work of converting kthreads to workqueues
> and/or making kthreads freezable.
> 
> 
> But... maybe let's forget all that.  Here's yet another idea which is
> currently my favorite :-)
> 
> 1. Install an ftrace handler at the beginning of every to-be-patched
>    function which converts the current thread to the new universe if
>    there are no other patched functions on its stack.
> 
> 2. Run the following on each cpu (smp_call_function):
> 	for each thread
> 		if task_cpu(task) == this cpu
> 			check backtrace and convert thread to new universe if possible
> 
>    [ This introduces some scheduling latency, but it's much better than
>    stop_machine. ]
> 
> 3. Send fake signals to any remaining old universe tasks, and wait for
>    the ftrace handler to convert them.  Then repeat step 2 to catch
>    those stragglers for which a signal didn't cause it to run the same
>    code path again.  Now the patching is complete (no "in progress"
>    state needed).
> 
> 3. (Alternative) Have an "in progress" state.  Periodically check if all
>    tasks have been converted and repeat step 2 if not.  User can
>    optionally send signals to tasks to get them to exit out of the old
>    functions.
> 

Also it may be possible to combine these two ideas in different ways.
For example, step 2 could be replaced by:

for_each_thread
    if (!on_cpu(thread) and !on_stack(thread, funcs)
        switch thread to new universe

if that could be done in a race-free manner.

-- 
Josh
--
To unsubscribe from this list: send the line "unsubscribe live-patching" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html