On Thu, 2012-05-17 at 12:26 -0400, Jörn Engel wrote: > As the comment explains, we have a race between dropping the refcount to > zero and actually removing the session from lookup tables. Plug this by > checking the refcount on lookup. > > Signed-off-by: Joern Engel <joern@xxxxxxxxx> > --- > drivers/scsi/qla2xxx/tcm_qla2xxx.c | 28 ++++++++++++++++++++++++++++ > 1 file changed, 28 insertions(+) > Hi Joern! > diff --git a/drivers/scsi/qla2xxx/tcm_qla2xxx.c b/drivers/scsi/qla2xxx/tcm_qla2xxx.c > index 09a70aa..b1bfa08 100644 > --- a/drivers/scsi/qla2xxx/tcm_qla2xxx.c > +++ b/drivers/scsi/qla2xxx/tcm_qla2xxx.c > @@ -1300,6 +1300,20 @@ static struct qla_tgt_sess *tcm_qla2xxx_find_sess_by_s_id( > return NULL; > } > > + /* > + * We drop the hardware_lock between dropping the reference count to > + * zero and removing the session from the btree. Therefore it > + * is possible to successfully lookup a session after it has been > + * scheduled for removal, but before actual removal. Result is pretty > + * good chance of dereferencing freed memory in qlt_issue_task_mgmt. > + * > + * Proper fix would be to atomically drop the refcount and remove the > + * session from all data structures that would allow lookup. Until > + * then this tasteless hack shall suffice. > + */ > + if (atomic_read(&nacl->qla_tgt_sess->se_sess->sess_kref.refcount) == 0) > + return NULL; > + > return nacl->qla_tgt_sess; > } > Mmmmm, I'd still like to try to avoid (another) atomic op in the fast path w/ hardware_lock held if at all possible.. Also thinking more about the case in your comment, I'm starting to wonder if tcm_qla2xxx_free_session() should be obtaining hardware_lock + clearing the s_id and loop_id lookup entries earlier than currently done.. That is, target_wait_for_sess_cmds() is called before clearing the nodeacl pointers from s_id + loop_id in tcm_qla2xxx_free_session(), which means that we still perform se_node_acl lookups in the case where the individual tcm_qla2xxx endpoint was not also being shutdown as well.. So that said, I think your work-around is OK to address the current BUG, but the s_id + loop_id clearing order in tcm_qla2xxx_free_session() would be having an effect here as well.. --nab -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html