Re: [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long

Josh Poimboeuf <jpoimboe@xxxxxxxxxx> · Fri, 14 Feb 2025 10:12:06 -0800

On Fri, Feb 14, 2025 at 03:44:10PM +0100, Petr Mladek wrote:
> I guess that we really could consider the new task as migrated
> and clear TIF_PATCH_PENDING.
> 
> But we can't set child->patch_state to KLP_TRANSITION_IDLE. It won't
> work when the transition gets reverted. [*]

Hm, why not?  I don't see any difference between patching or unpatching?

klp_init_transition() has a barrier to enforce the order of the
klp_target_state and func->transition writes, as read by the
klp_ftrace_handler().

So in the ftrace handler, if func->transition is set and the task is
KLP_TRANSITION_IDLE, it can use klp_target_state to decide which
function to use.  Its patch state would effectively be the same as any
other already-transitioned task, whether it's patching or unpatching.

Then in klp_complete_transition(), after func->transition gets set to
false, klp_synchronize_transition() flushes out any running ftrace
handlers.  From that point on, func->transition is false, so the ftrace
handler would no longer read klp_target_state.

> [*] I gave this few brain cycles but I did not find any elegant
>     way how to set this a safe way and allow using rcu_read_lock()
>     in klp_try_complete_transition().
> 
>     It might be because it is Friday evening and I am leaving for
>     a trip tomorrow. Also I not motivated enough to think about it
>     because Yafang saw the RCU stall even with that rcu_read_lock().
>     So I send this just for record.

Even if it doesn't fix the RCU stalls, I think we should still try to
avoid holding the tasklist_lock.  It's a global lock which can be
contended, and we want the livepatch transition to be as unobtrusive as
possible.

If the system is doing a lot of forking across many CPUs, holding the
lock could block all the forking tasks and trigger system-wide
scheduling latencies.  And that could be compounded by the unnecessary
transitioning of new tasks every time the delayed work runs.

-- 
Josh