On Sat, Feb 15, 2025 at 2:12 AM Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote: > > On Fri, Feb 14, 2025 at 03:44:10PM +0100, Petr Mladek wrote: > > I guess that we really could consider the new task as migrated > > and clear TIF_PATCH_PENDING. > > > > But we can't set child->patch_state to KLP_TRANSITION_IDLE. It won't > > work when the transition gets reverted. [*] > > Hm, why not? I don't see any difference between patching or unpatching? > > klp_init_transition() has a barrier to enforce the order of the > klp_target_state and func->transition writes, as read by the > klp_ftrace_handler(). > > So in the ftrace handler, if func->transition is set and the task is > KLP_TRANSITION_IDLE, it can use klp_target_state to decide which > function to use. Its patch state would effectively be the same as any > other already-transitioned task, whether it's patching or unpatching. > > Then in klp_complete_transition(), after func->transition gets set to > false, klp_synchronize_transition() flushes out any running ftrace > handlers. From that point on, func->transition is false, so the ftrace > handler would no longer read klp_target_state. > > > [*] I gave this few brain cycles but I did not find any elegant > > way how to set this a safe way and allow using rcu_read_lock() > > in klp_try_complete_transition(). > > > > It might be because it is Friday evening and I am leaving for > > a trip tomorrow. Also I not motivated enough to think about it > > because Yafang saw the RCU stall even with that rcu_read_lock(). > > So I send this just for record. > > Even if it doesn't fix the RCU stalls, I think we should still try to > avoid holding the tasklist_lock. It's a global lock which can be > contended, and we want the livepatch transition to be as unobtrusive as > possible. I agree. Since it's feasible to eliminate this global lock, I think it's worth pursuing this optimization. > > If the system is doing a lot of forking across many CPUs, holding the > lock could block all the forking tasks and trigger system-wide > scheduling latencies. And that could be compounded by the unnecessary > transitioning of new tasks every time the delayed work runs. -- Regards Yafang