Re: [PATCH] qla2xxx/tcm_qla2xxx: Add qla_hw_data specific I/O workqueue offload

"Nicholas A. Bellinger" <nab@xxxxxxxxxxxxxxx> · Tue, 08 Nov 2011 13:32:59 -0800

On Tue, 2011-11-08 at 07:43 -0500, Christoph Hellwig wrote:
> On Tue, Nov 08, 2011 at 09:16:56AM +0000, Nicholas A. Bellinger wrote:
> > From: Nicholas Bellinger <nab@xxxxxxxxxxxxxxx>
> > 
> > This patch adds an initial conversion to cmwq using qla_tgt->qla_tgt_wq
> > for I/O dispatch into qla_hw_data->qla_tgt specific context.  This includes
> > removal of transport_generic_handle_cdb_map() usage, and moves
> > transport_lookup_cmd_lun() into tcm_qla2xxx_do_work() process context for
> > incoming descriptor dispatch.
> 
> I really do not like the model.  We already have code to offload the
> command submission in qla_tgt_handle_cmd_for_atio when we can't find
> an existing session.  So with this code we might do two workqueue
> offloads.  In addition the existing code is there, and already well
> tested given that hit it for the that case all the time.  I'd recommend
> you apply my patch 3 from this weekends series instead to use that path
> unconditionally, and then we can look into the next steps:
> 

Using qla_tgt_exec_cmd_work() in the main I/O path means currently
requiring hardware_lock (again) in process context for every I/O
submission.  Also, AFAICT in the original code with QLA_TGT_SESS_WORK_CMD
(now qla_tgt_exec_cmd_work) is only ever called when a session has
not been found under the first hardware_lock access.

This patch does the necessary minimal setup w/ hardware_lock held in
interrupt context (eg: calling transport_init_se_cmd and
target_get_sess_cmd), but does everything else from process context in
tcm_qla2xxx_do_work() w/o needing to reacquire hardware_lock again.

Why not instead convert qla_tgt_exec_cmd_work() to be called only when
no qla_tgt_sess can be found to invoke qla_tgt_make_local_sess()..?

> 
> 
>  - make it use a qla2xxx-local workqueue

Do you mean a global qla2xxx local workqueue, instead of a qla_hw_data
specific workqueue..?  Is there a reason why having a per qla_hw_data WQ
is not a good idea..?

>  - stop holding the hardware lock around qla_tgt_send_cmd_to_target

Why should we reacquire hardware_lock at all in process context..?
The code in qla_tgt_exec_cmd_work() does this because it needs to lookup
qla_tgt_sess a second time during the main path, but if we've already
made a reference using target_get_sess_cmd() it's not necessary to
require hardware_lock at all.

> Note that having the workqueue in qla_target is pretty much nessecary
> for doing the remaining moves of target calls into user context,
> including TMRs, transport_generic_handle_data and so on.
> 

To clarify, I'm not sure that adding target_core_mod symbol dependencies
into the qla2xxx LLD makes sense.  This would mean target_core_mod would
need to be loaded even when qla2xxx is configured to run in initiator
mode (the presumable default).

> > It passes WQ_UNBOUND during alloc_workqueue() setup, and also keeps all
> > target_core_mod referenced symbols within tcm_qla2xxx code in order to
> > avoid an qla2xxx LLD target_core_mod dependency.
> 
> Care to explain why you use WQ_UNBOUND?  This loses a big scalability
> advantage of per-cpu workqueues, and none of the reasons in the
> documentation for it seem to apply here.

Just testing with some diffrent alloc_workqueue() settings.  I'll drop
WQ_UNBOUND now if this is really hurts per-cpu workqueue scalability.

Also, I've got the ->locked_rsp removal ready that cleans up (all..?)
the funky response cases.  I'll send these out shortly on top of what
I've already committed for this patch, and if pushing everything into
qla_tgt_exec_cmd_work() for the main I/O path really makes sense, I'll
drop the extra parts from the patch.

--nab

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html