Re: [PATCH] qla2xxx: Use global qla_tgt_wq workqueue

"Nicholas A. Bellinger" <nab@xxxxxxxxxxxxxxx> · Tue, 08 Nov 2011 22:28:07 -0800

On Wed, 2011-11-09 at 05:54 +0000, Nicholas A. Bellinger wrote:
> From: Nicholas Bellinger <nab@xxxxxxxxxxxxxxx>
> 
> This patch converts qla_target.c code to use a single local scope
> qla_tgt_wq setup during qla_tgt_init(), and drops qla_tgt->qla_tgt_wq
> usage.  It also makes tcm_qla2xxx_handle_data() use tcm_qla2xxx_free_wq
> for handling aborted command response tcm_qla2xxx_handle_data()
> 
> Cc: Christoph Hellwig <hch@xxxxxx>
> Cc: Roland Dreier <roland@xxxxxxxxxxxxxxx>
> Signed-off-by: Nicholas Bellinger <nab@xxxxxxxxxxxxxxx>
> ---
>  drivers/scsi/qla2xxx/qla_target.c               |   17 +++++++++++------
>  drivers/scsi/qla2xxx/qla_target.h               |    1 -
>  drivers/target/tcm_qla2xxx/tcm_qla2xxx_fabric.c |    2 +-
>  3 files changed, 12 insertions(+), 8 deletions(-)
> 

Hi Christoph,

What extra parameters do you recommend using for the alloc_workqueue()
setup here..? 

While doing some initial fio large block performance tests with a 2x
port 8 Gb/sec 25xx parts with this evening, I noticed the per HW port
qla_tgt->qla_tgt_wq workqueues using alloc_workqueue("qla_tgt_wq,
WQ_UNBOUND, 1); seem to have a slight performance edge over this patch
to start using a single global qla_tgt_wq setup in qla_target.c with
alloc_workqueue("qla_tgt_wq, 0, 0).

It's on the order of a ~300 MB/sec difference between the two, with the
per qla_tgt->qla_tgt_wq running very near what the backend is capable of
at ~1500 MB/sec, and this patch to convert to a single qla_tgt_wq with
the same tests are ~1200 MB/sec.

For the slower case with this patch, a single kworker thread is running
@ 100% CPU utilization with the wq defaults.  Using per HW port context
qla_tgt->qla_tgt_wq with WQ_UNBOUND, 1, multiple kworkers are running at
~40% utilization, and AFAICT seem to be doing a better job of
distributing load across multiple qla_hw_data ports.

Any thoughts on what might be limiting the single global qla_tgt_wq in
this patch against per HW port qla_tgt_wq dispatch..?

Thanks,

--nab

> diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c
> index dcf58ab..857a8e2 100644
> --- a/drivers/scsi/qla2xxx/qla_target.c
> +++ b/drivers/scsi/qla2xxx/qla_target.c
> @@ -117,6 +117,7 @@ static int qla_tgt_unreg_sess(struct qla_tgt_sess *sess);
>  static struct kmem_cache *qla_tgt_cmd_cachep;
>  static struct kmem_cache *qla_tgt_mgmt_cmd_cachep;
>  static mempool_t *qla_tgt_mgmt_cmd_mempool;
> +static struct workqueue_struct *qla_tgt_wq;
>  /*
>   * From qla2xxx/qla_iobc.c and used by various qla_target.c logic
>   */
> @@ -4880,11 +4881,6 @@ int qla_tgt_add_target(struct qla_hw_data *ha, struct scsi_qla_host *base_vha)
>  	INIT_WORK(&tgt->srr_work, qla_tgt_handle_srr_work);
>  	atomic_set(&tgt->tgt_global_resets_count, 0);
>  	
> -	tgt->qla_tgt_wq = alloc_workqueue("qla_tgt_wq", WQ_UNBOUND, 0);
> -	if (!tgt->qla_tgt_wq)
> -		return -ENOMEM;
> -
> -	printk("Setup tgt->qla_tgt_wq: %p for qla_hw_data: %p\n", tgt->qla_tgt_wq, ha);
>  	ha->qla_tgt = tgt;
>  
>  	if (IS_FWI2_CAPABLE(ha)) {
> @@ -4927,7 +4923,6 @@ int qla_tgt_remove_target(struct qla_hw_data *ha, struct scsi_qla_host *vha)
>  			"existing target", vha->vp_idx);
>  		return 0;
>  	}
> -	destroy_workqueue(ha->qla_tgt->qla_tgt_wq);
>  
>  	ql_dbg(ql_dbg_tgt, vha, 0xe037, "Unregistering target for host %ld(%p)",
>  			vha->host_no, ha);
> @@ -5451,8 +5446,17 @@ int __init qla_tgt_init(void)
>  		goto out_mgmt_cmd_cachep;
>  	}
>  
> +	qla_tgt_wq = alloc_workqueue("qla_tgt_wq", 0, 0);
> +	if (!qla_tgt_wq) {
> +		pr_warn(KERN_ERR "alloc_workqueue for qla_tgt_wq failed\n");
> +		ret = -ENOMEM;
> +		goto out_cmd_mempool;
> +	}
> +
>  	return 0;
>  
> +out_cmd_mempool:
> +	mempool_destroy(qla_tgt_mgmt_cmd_mempool);
>  out_mgmt_cmd_cachep:
>  	kmem_cache_destroy(qla_tgt_mgmt_cmd_cachep);
>  out:
> @@ -5462,6 +5466,7 @@ out:
>  
>  void __exit qla_tgt_exit(void)
>  {
> +	destroy_workqueue(qla_tgt_wq);
>  	mempool_destroy(qla_tgt_mgmt_cmd_mempool);
>  	kmem_cache_destroy(qla_tgt_mgmt_cmd_cachep);
>  	kmem_cache_destroy(qla_tgt_cmd_cachep);
> diff --git a/drivers/scsi/qla2xxx/qla_target.h b/drivers/scsi/qla2xxx/qla_target.h
> index 22d9bc0..85b8cca 100644
> --- a/drivers/scsi/qla2xxx/qla_target.h
> +++ b/drivers/scsi/qla2xxx/qla_target.h
> @@ -870,7 +870,6 @@ struct qla_tgt {
>  	spinlock_t sess_work_lock;
>  	struct list_head sess_works_list;
>  	struct work_struct sess_work;
> -	struct workqueue_struct *qla_tgt_wq;
>  
>  	imm_ntfy_from_isp_t link_reinit_iocb;
>  	wait_queue_head_t waitQ;
> diff --git a/drivers/target/tcm_qla2xxx/tcm_qla2xxx_fabric.c b/drivers/target/tcm_qla2xxx/tcm_qla2xxx_fabric.c
> index c738a1c..49b3e69 100644
> --- a/drivers/target/tcm_qla2xxx/tcm_qla2xxx_fabric.c
> +++ b/drivers/target/tcm_qla2xxx/tcm_qla2xxx_fabric.c
> @@ -771,7 +771,7 @@ int tcm_qla2xxx_handle_data(struct qla_tgt_cmd *cmd)
>  
>  		se_cmd->scsi_sense_reason = TCM_CHECK_CONDITION_ABORT_CMD;
>  		INIT_WORK(&cmd->work, tcm_qla2xxx_do_rsp);
> -		queue_work(cmd->tgt->qla_tgt_wq, &cmd->work);
> +		queue_work(tcm_qla2xxx_free_wq, &cmd->work);
>  		return 0;
>  	}
>  	/*

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html