Our system was a 2 Numa node systems and we did observe significant improvements. This is especially true with increasing number of IO devices distributed between Numa nodes. Moussa > On Jan 30, 2015, at 2:05 PM, Sagi Grimberg <sagig@xxxxxxxxxxxxxxxxxx> wrote: > > >>> On Sun, 2015-01-25 at 19:09 +0200, Sagi Grimberg wrote: >>> Bound workqueues might be too restrictive since they allow >>> only a single core per session for processing completions. >>> WQ_UNBOUND will allow bouncing to another CPU if the running >>> CPU is currently busy. Luckily, our workqueues are NUMA aware >>> and will first try to bounce within the same NUMA socket. >>> My measurements with NULL backend devices show that there is >>> no (noticeable) additional latency as a result of the change. >>> I'd expect even to gain performance when working with fast >>> devices that also allocate MSIX interrupt vectors. >>> >>> While we're at it, make it WQ_HIGHPRI since processing >>> completions is really a high priority for performance. >>> >>> This one is an RFC since I'd like to ask the users to try out >>> this patch and report the results. >>> >>> Signed-off-by: Sagi Grimberg <sagig@xxxxxxxxxxxx> >>> --- >>> drivers/infiniband/ulp/isert/ib_isert.c | 3 ++- >>> 1 files changed, 2 insertions(+), 1 deletions(-) >>> >>> diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c >>> index dafb3c5..6b5ce34 100644 >>> --- a/drivers/infiniband/ulp/isert/ib_isert.c >>> +++ b/drivers/infiniband/ulp/isert/ib_isert.c >>> @@ -3320,7 +3320,8 @@ static int __init isert_init(void) >>> { >>> int ret; >>> >>> - isert_comp_wq = alloc_workqueue("isert_comp_wq", 0, 0); >>> + isert_comp_wq = alloc_workqueue("isert_comp_wq", >>> + WQ_UNBOUND | WQ_HIGHPRI, 0); >>> if (!isert_comp_wq) { >>> isert_err("Unable to allocate isert_comp_wq\n"); >>> ret = -ENOMEM; >> >> Moussa has been using the WQ_UNBOUND bit here along with a mlx4 driver >> change to increase the number of EQs available for some time now, with >> impressive small block performance results.. > > I'm specifically interested in performance in NUMA systems (2 or more sockets). I'm wandering if this was the case in Mossa's tests. No arguments that this patch helps performance in a single socket systems. > >> I'm going to merge this into target-pending/for-next for now, and can >> drop it later if it ends up being problematic. >> Btw Moussa, what ever happened to the mlx4 driver change..? > > Mossa's mlx4 patch wasn't the correct way to handle this BUG (mlx4 reserved almost all core EQs for mlx4_en and only 3 for RoCE). The correct fix is heading upstream and hopefully be included in 3.20. > > Sagi. -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html