On Tue, Nov 08, 2011 at 10:28:07PM -0800, Nicholas A. Bellinger wrote: > Hi Christoph, > > What extra parameters do you recommend using for the alloc_workqueue() > setup here..? > > While doing some initial fio large block performance tests with a 2x > port 8 Gb/sec 25xx parts with this evening, I noticed the per HW port > qla_tgt->qla_tgt_wq workqueues using alloc_workqueue("qla_tgt_wq, > WQ_UNBOUND, 1); seem to have a slight performance edge over this patch > to start using a single global qla_tgt_wq setup in qla_target.c with > alloc_workqueue("qla_tgt_wq, 0, 0). > > It's on the order of a ~300 MB/sec difference between the two, with the > per qla_tgt->qla_tgt_wq running very near what the backend is capable of > at ~1500 MB/sec, and this patch to convert to a single qla_tgt_wq with > the same tests are ~1200 MB/sec. > > For the slower case with this patch, a single kworker thread is running > @ 100% CPU utilization with the wq defaults. Using per HW port context > qla_tgt->qla_tgt_wq with WQ_UNBOUND, 1, multiple kworkers are running at > ~40% utilization, and AFAICT seem to be doing a better job of > distributing load across multiple qla_hw_data ports. > > Any thoughts on what might be limiting the single global qla_tgt_wq in > this patch against per HW port qla_tgt_wq dispatch..? Read through Documentation/workqueue.txt. It sounds like this workqueue is CPU bound, although I wonder why. Can you look at perf top and see were we spend the time? -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html