On Tue, 2011-11-08 at 07:43 -0500, Christoph Hellwig wrote: > On Tue, Nov 08, 2011 at 09:16:56AM +0000, Nicholas A. Bellinger wrote: > > From: Nicholas Bellinger <nab@xxxxxxxxxxxxxxx> > > > > This patch adds an initial conversion to cmwq using qla_tgt->qla_tgt_wq > > for I/O dispatch into qla_hw_data->qla_tgt specific context. This includes > > removal of transport_generic_handle_cdb_map() usage, and moves > > transport_lookup_cmd_lun() into tcm_qla2xxx_do_work() process context for > > incoming descriptor dispatch. > > I really do not like the model. We already have code to offload the > command submission in qla_tgt_handle_cmd_for_atio when we can't find > an existing session. So with this code we might do two workqueue > offloads. In addition the existing code is there, and already well > tested given that hit it for the that case all the time. I'd recommend > you apply my patch 3 from this weekends series instead to use that path > unconditionally, and then we can look into the next steps: > Using qla_tgt_exec_cmd_work() in the main I/O path means currently requiring hardware_lock (again) in process context for every I/O submission. Also, AFAICT in the original code with QLA_TGT_SESS_WORK_CMD (now qla_tgt_exec_cmd_work) is only ever called when a session has not been found under the first hardware_lock access. This patch does the necessary minimal setup w/ hardware_lock held in interrupt context (eg: calling transport_init_se_cmd and target_get_sess_cmd), but does everything else from process context in tcm_qla2xxx_do_work() w/o needing to reacquire hardware_lock again. Why not instead convert qla_tgt_exec_cmd_work() to be called only when no qla_tgt_sess can be found to invoke qla_tgt_make_local_sess()..? > > > - make it use a qla2xxx-local workqueue Do you mean a global qla2xxx local workqueue, instead of a qla_hw_data specific workqueue..? Is there a reason why having a per qla_hw_data WQ is not a good idea..? > - stop holding the hardware lock around qla_tgt_send_cmd_to_target Why should we reacquire hardware_lock at all in process context..? The code in qla_tgt_exec_cmd_work() does this because it needs to lookup qla_tgt_sess a second time during the main path, but if we've already made a reference using target_get_sess_cmd() it's not necessary to require hardware_lock at all. > Note that having the workqueue in qla_target is pretty much nessecary > for doing the remaining moves of target calls into user context, > including TMRs, transport_generic_handle_data and so on. > To clarify, I'm not sure that adding target_core_mod symbol dependencies into the qla2xxx LLD makes sense. This would mean target_core_mod would need to be loaded even when qla2xxx is configured to run in initiator mode (the presumable default). > > It passes WQ_UNBOUND during alloc_workqueue() setup, and also keeps all > > target_core_mod referenced symbols within tcm_qla2xxx code in order to > > avoid an qla2xxx LLD target_core_mod dependency. > > Care to explain why you use WQ_UNBOUND? This loses a big scalability > advantage of per-cpu workqueues, and none of the reasons in the > documentation for it seem to apply here. Just testing with some diffrent alloc_workqueue() settings. I'll drop WQ_UNBOUND now if this is really hurts per-cpu workqueue scalability. Also, I've got the ->locked_rsp removal ready that cleans up (all..?) the funky response cases. I'll send these out shortly on top of what I've already committed for this patch, and if pushing everything into qla_tgt_exec_cmd_work() for the main I/O path really makes sense, I'll drop the extra parts from the patch. --nab -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html