Re: [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long

Josh Poimboeuf <jpoimboe@xxxxxxxxxx> · Tue, 11 Feb 2025 16:40:09 -0800

On Tue, Feb 11, 2025 at 02:24:36PM +0800, Yafang Shao wrote:
>  void klp_try_complete_transition(void)
>  {
> +	unsigned long timeout, proceed_pending_processes;
>  	unsigned int cpu;
>  	struct task_struct *g, *task;
>  	struct klp_patch *patch;
> @@ -467,9 +468,30 @@ void klp_try_complete_transition(void)
>  	 * unless the patch includes changes to a very common function.
>  	 */
>  	read_lock(&tasklist_lock);
> -	for_each_process_thread(g, task)
> +	timeout = jiffies + HZ;
> +	proceed_pending_processes = 0;
> +	for_each_process_thread(g, task) {
> +		/* check if this task has already switched over */
> +		if (task->patch_state == klp_target_state)
> +			continue;
> +
> +		proceed_pending_processes++;
> +
>  		if (!klp_try_switch_task(task))
>  			complete = false;
> +
> +		/*
> +		 * Prevent hardlockup by not blocking tasklist_lock for too long.
> +		 * But guarantee the forward progress by making sure at least
> +		 * some pending processes were checked.
> +		 */
> +		if (rwlock_is_contended(&tasklist_lock) &&
> +		    time_after(jiffies, timeout) &&
> +		    proceed_pending_processes > 100) {
> +			complete = false;
> +			break;
> +		}
> +	}
>  	read_unlock(&tasklist_lock);

Instead of all this can we not just use rcu_read_lock() instead of
tasklist_lock?

Petr, I know you mentioned that would widen the race window for the
do_exit() path, but don't we need to fix that race anyway?

-- 
Josh