This is a note to let you know that I've just added the patch titled workqueue: fix ordered workqueues in NUMA setups to the 3.10-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: workqueue-fix-ordered-workqueues-in-numa-setups.patch and it can be found in the queue-3.10 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable@xxxxxxxxxxxxxxx> know about it. >From 8a2b75384444488fc4f2cbb9f0921b6a0794838f Mon Sep 17 00:00:00 2001 From: Tejun Heo <tj@xxxxxxxxxx> Date: Thu, 5 Sep 2013 12:30:04 -0400 Subject: workqueue: fix ordered workqueues in NUMA setups From: Tejun Heo <tj@xxxxxxxxxx> commit 8a2b75384444488fc4f2cbb9f0921b6a0794838f upstream. An ordered workqueue implements execution ordering by using single pool_workqueue with max_active == 1. On a given pool_workqueue, work items are processed in FIFO order and limiting max_active to 1 enforces the queued work items to be processed one by one. Unfortunately, 4c16bd327c ("workqueue: implement NUMA affinity for unbound workqueues") accidentally broke this guarantee by applying NUMA affinity to ordered workqueues too. On NUMA setups, an ordered workqueue would end up with separate pool_workqueues for different nodes. Each pool_workqueue still limits max_active to 1 but multiple work items may be executed concurrently and out of order depending on which node they are queued to. Fix it by using dedicated ordered_wq_attrs[] when creating ordered workqueues. The new attrs match the unbound ones except that no_numa is always set thus forcing all NUMA nodes to share the default pool_workqueue. While at it, add sanity check in workqueue creation path which verifies that an ordered workqueues has only the default pool_workqueue. Signed-off-by: Tejun Heo <tj@xxxxxxxxxx> Reported-by: Libin <huawei.libin@xxxxxxxxxx> Cc: Lai Jiangshan <laijs@xxxxxxxxxxxxxx> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> --- kernel/workqueue.c | 24 ++++++++++++++++++++++-- 1 file changed, 22 insertions(+), 2 deletions(-) --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -295,6 +295,9 @@ static DEFINE_HASHTABLE(unbound_pool_has /* I: attributes used when instantiating standard unbound pools on demand */ static struct workqueue_attrs *unbound_std_wq_attrs[NR_STD_WORKER_POOLS]; +/* I: attributes used when instantiating ordered pools on demand */ +static struct workqueue_attrs *ordered_wq_attrs[NR_STD_WORKER_POOLS]; + struct workqueue_struct *system_wq __read_mostly; EXPORT_SYMBOL(system_wq); struct workqueue_struct *system_highpri_wq __read_mostly; @@ -4059,7 +4062,7 @@ out_unlock: static int alloc_and_link_pwqs(struct workqueue_struct *wq) { bool highpri = wq->flags & WQ_HIGHPRI; - int cpu; + int cpu, ret; if (!(wq->flags & WQ_UNBOUND)) { wq->cpu_pwqs = alloc_percpu(struct pool_workqueue); @@ -4079,6 +4082,13 @@ static int alloc_and_link_pwqs(struct wo mutex_unlock(&wq->mutex); } return 0; + } else if (wq->flags & __WQ_ORDERED) { + ret = apply_workqueue_attrs(wq, ordered_wq_attrs[highpri]); + /* there should only be single pwq for ordering guarantee */ + WARN(!ret && (wq->pwqs.next != &wq->dfl_pwq->pwqs_node || + wq->pwqs.prev != &wq->dfl_pwq->pwqs_node), + "ordering guarantee broken for workqueue %s\n", wq->name); + return ret; } else { return apply_workqueue_attrs(wq, unbound_std_wq_attrs[highpri]); } @@ -4990,13 +5000,23 @@ static int __init init_workqueues(void) } } - /* create default unbound wq attrs */ + /* create default unbound and ordered wq attrs */ for (i = 0; i < NR_STD_WORKER_POOLS; i++) { struct workqueue_attrs *attrs; BUG_ON(!(attrs = alloc_workqueue_attrs(GFP_KERNEL))); attrs->nice = std_nice[i]; unbound_std_wq_attrs[i] = attrs; + + /* + * An ordered wq should have only one pwq as ordering is + * guaranteed by max_active which is enforced by pwqs. + * Turn off NUMA so that dfl_pwq is used for all nodes. + */ + BUG_ON(!(attrs = alloc_workqueue_attrs(GFP_KERNEL))); + attrs->nice = std_nice[i]; + attrs->no_numa = true; + ordered_wq_attrs[i] = attrs; } system_wq = alloc_workqueue("events", 0, 0); Patches currently in stable-queue which might be from tj@xxxxxxxxxx are queue-3.10/drivers-libata-set-max-sector-to-65535-for-slimtype-dvd-a-ds8a9sh-drive.patch queue-3.10/cpuset-fix-memory-allocator-deadlock.patch queue-3.10/workqueue-fix-ordered-workqueues-in-numa-setups.patch queue-3.10/ahci-add-marvell-9230-to-the-ahci-pci-device-list.patch queue-3.10/ahci-disabled-fbs-prior-to-issuing-software-reset.patch queue-3.10/ahci-add-device-ids-for-intel-wildcat-point-lp.patch queue-3.10/blk-core-fix-memory-corruption-if-blkcg_init_queue-fails.patch queue-3.10/libata-fix-display-of-sata-speed.patch queue-3.10/loop-fix-crash-if-blk_alloc_queue-fails.patch -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html