On Wed, 2011-11-09 at 09:01 -0800, Roland Dreier wrote: > On Tue, Nov 8, 2011 at 10:28 PM, Nicholas A. Bellinger > <nab@xxxxxxxxxxxxxxx> wrote: > > For the slower case with this patch, a single kworker thread is running > > @ 100% CPU utilization with the wq defaults. Using per HW port context > > qla_tgt->qla_tgt_wq with WQ_UNBOUND, 1, multiple kworkers are running at > > ~40% utilization, and AFAICT seem to be doing a better job of > > distributing load across multiple qla_hw_data ports. > > I think if you just use plain queue_work() on an alloc_workqueue(..., > 0, 0) queue, > then the work item will run on the CPU it's queued on, which means you might > not get concurrency if everything is on the same CPU. > After a bit more investigation, this is what I came up with too. Thanks for the clarification. > I didn't see in you patch where you actually use qla_tgt_wq but I guess you > need to either make sure the different interrupts are distributed to different > CPUs or (this is ugly) use queue_work_on() to send things to remote CPUs. > The queue_work() call managed to go missing from the posted patch, grrr.. Anyways, I was hoping that qla2xxx would be a bit smarter wrt to distributing interrupt load, but apparently that's not the case. I'll take a look at manually setting the IRQ affinity of qla2xxx rsp/req vectors to individual CPUs and see where that gets us. Thanks Roland! --nab -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html