Re: [PATCH 3.18 11/61] workqueue: fix subtle pool management issue which can stall whole worker_pool

Lai Jiangshan <laijs@xxxxxxxxxxxxxx> · Wed, 28 Jan 2015 09:51:54 +0800

On 01/28/2015 09:26 AM, Greg Kroah-Hartman wrote:
> 3.18-stable review patch.  If anyone has any objections, please let me know.

I don't think it is a bug-fix.
It is just a good cleanup.

> 
> ------------------
> 
> From: Tejun Heo <tj@xxxxxxxxxx>
> 
> commit 29187a9eeaf362d8422e62e17a22a6e115277a49 upstream.
> 
> A worker_pool's forward progress is guaranteed by the fact that the
> last idle worker assumes the manager role to create more workers and
> summon the rescuers if creating workers doesn't succeed in timely
> manner before proceeding to execute work items.
> 
> This manager role is implemented in manage_workers(), which indicates
> whether the worker may proceed to work item execution with its return
> value.  This is necessary because multiple workers may contend for the
> manager role, and, if there already is a manager, others should
> proceed to work item execution.
> 
> Unfortunately, the function also indicates that the worker may proceed
> to work item execution if need_to_create_worker() is false at the head
> of the function.  need_to_create_worker() tests the following
> conditions.
> 
> 	pending work items && !nr_running && !nr_idle
> 
> The first and third conditions are protected by pool->lock and thus
> won't change while holding pool->lock; however, nr_running can change
> asynchronously as other workers block and resume and while it's likely
> to be zero, as someone woke this worker up in the first place, some
> other workers could have become runnable inbetween making it non-zero.
> 
> If this happens, manage_worker() could return false even with zero
> nr_idle making the worker, the last idle one, proceed to execute work
> items.  If then all workers of the pool end up blocking on a resource
> which can only be released by a work item which is pending on that
> pool, the whole pool can deadlock as there's no one to create more
> workers or summon the rescuers.
> 
> This patch fixes the problem by removing the early exit condition from
> maybe_create_worker() and making manage_workers() return false iff
> there's already another manager, which ensures that the last worker
> doesn't start executing work items.
> 
> We can leave the early exit condition alone and just ignore the return
> value but the only reason it was put there is because the
> manage_workers() used to perform both creations and destructions of
> workers and thus the function may be invoked while the pool is trying
> to reduce the number of workers.  Now that manage_workers() is called
> only when more workers are needed, the only case this early exit
> condition is triggered is rare race conditions rendering it pointless.
> 
> Tested with simulated workload and modified workqueue code which
> trigger the pool deadlock reliably without this patch.
> 
> Signed-off-by: Tejun Heo <tj@xxxxxxxxxx>
> Reported-by: Eric Sandeen <sandeen@xxxxxxxxxxx>
> Link: http://lkml.kernel.org/g/54B019F4.8030009@xxxxxxxxxxx
> Cc: Dave Chinner <david@xxxxxxxxxxxxx>
> Cc: Lai Jiangshan <laijs@xxxxxxxxxxxxxx>
> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
> 
> ---
>  kernel/workqueue.c |   25 ++++++++-----------------
>  1 file changed, 8 insertions(+), 17 deletions(-)
> 
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1841,17 +1841,11 @@ static void pool_mayday_timeout(unsigned
>   * spin_lock_irq(pool->lock) which may be released and regrabbed
>   * multiple times.  Does GFP_KERNEL allocations.  Called only from
>   * manager.
> - *
> - * Return:
> - * %false if no action was taken and pool->lock stayed locked, %true
> - * otherwise.
>   */
> -static bool maybe_create_worker(struct worker_pool *pool)
> +static void maybe_create_worker(struct worker_pool *pool)
>  __releases(&pool->lock)
>  __acquires(&pool->lock)
>  {
> -	if (!need_to_create_worker(pool))
> -		return false;
>  restart:
>  	spin_unlock_irq(&pool->lock);
>  
> @@ -1877,7 +1871,6 @@ restart:
>  	 */
>  	if (need_to_create_worker(pool))
>  		goto restart;
> -	return true;
>  }
>  
>  /**
> @@ -1897,16 +1890,14 @@ restart:
>   * multiple times.  Does GFP_KERNEL allocations.
>   *
>   * Return:
> - * %false if the pool don't need management and the caller can safely start
> - * processing works, %true indicates that the function released pool->lock
> - * and reacquired it to perform some management function and that the
> - * conditions that the caller verified while holding the lock before
> - * calling the function might no longer be true.
> + * %false if the pool doesn't need management and the caller can safely
> + * start processing works, %true if management function was performed and
> + * the conditions that the caller verified before calling the function may
> + * no longer be true.
>   */
>  static bool manage_workers(struct worker *worker)
>  {
>  	struct worker_pool *pool = worker->pool;
> -	bool ret = false;
>  
>  	/*
>  	 * Anyone who successfully grabs manager_arb wins the arbitration
> @@ -1919,12 +1910,12 @@ static bool manage_workers(struct worker
>  	 * actual management, the pool may stall indefinitely.
>  	 */
>  	if (!mutex_trylock(&pool->manager_arb))
> -		return ret;
> +		return false;
>  
> -	ret |= maybe_create_worker(pool);
> +	maybe_create_worker(pool);
>  
>  	mutex_unlock(&pool->manager_arb);
> -	return ret;
> +	return true;
>  }
>  
>  /**
> 
> 
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html