Re: [RFC-v4 3/3] qla2xxx: Add tcm_qla2xxx fabric module for mainline target

"Nicholas A. Bellinger" <nab@xxxxxxxxxxxxxxx> · Fri, 23 Dec 2011 13:51:08 -0800

On Thu, 2011-12-22 at 00:10 -0800, Roland Dreier wrote:
> Hi Nic,
> 
> On Sat, Dec 17, 2011 at 6:02 PM, Nicholas A. Bellinger
> <nab@xxxxxxxxxxxxxxx> wrote:
> > +/*
> > + * Called from qla_target_template->free_cmd(), and will call
> > + * tcm_qla2xxx_release_cmd via normal struct target_core_fabric_ops
> > + * release callback.  qla_hw_data->hardware_lock is expected to be held
> > + */
> > +void tcm_qla2xxx_free_cmd(struct qla_tgt_cmd *cmd)
> > +{
> > +       barrier();
> > +       /*
> > +        * Handle tcm_qla2xxx_init_cmd() -> transport_get_lun_for_cmd()
> > +        * failure case where cmd->se_cmd.se_dev was not assigned, and
> > +        * a call to transport_generic_free_cmd_intr() is not possible..
> > +        */
> > +       if (!cmd->se_cmd.se_dev) {
> > +               target_put_sess_cmd(cmd->se_cmd.se_sess, &cmd->se_cmd);
> > +               transport_generic_free_cmd(&cmd->se_cmd, 0);
> > +               return;
> > +       }
> > +
> > +       if (!atomic_read(&cmd->se_cmd.t_transport_complete))
> > +               target_put_sess_cmd(cmd->se_cmd.se_sess, &cmd->se_cmd);
> > +
> > +       INIT_WORK(&cmd->work, tcm_qla2xxx_complete_free);
> > +       queue_work(tcm_qla2xxx_free_wq, &cmd->work);
> > +}
> 
> can you explain why you do the second target_put_sess_cmd()
> without a "return" here?  (the one when t_transport_complete == 0)
> 
> It seems this leads to use-after-free ... suppose cmd->execute_task in
> __transport_execute_tasks() returns an error (eg due to malformed
> emulated command from the initiator -- the easiest way to trigger this
> is to do something like "sg_raw /dev/sda 12 00 00 00 00 00" on a
> tcm_qla2xxx exported LUN).
> 
> Then we'll call transport_generic_request_failure()  which will end up
> calling transport_cmd_check_stop_to_fabric(), which will call into
> tcm_qla2xxx_check_stop_free(), which will do target_put_sess_cmd()
> so we'll be down to 1 reference on the cmd.
> 
> Then when the HW finishes sending the SCSI status back, we'll
> go into qla_tgt_do_ctio_completion(), which will call into ->free_cmd()
> and end up in the function quoted above.
> 
> Since we failed the command we never call transport_complete_task()
> so t_transport_complete will be 0 and we'll call target_put_sess_cmd()
> a second time and therefore free the command immediately, and then
> go ahead and queue up the work to free it a second time.
> 
> You can make this 100% reproducible and fatal by booting with
> "slub_debug=FZUP" (or whatever the corresponding SLAB config
> option is, I forget), and then doing some malformed emulated
> command that ends up returning bad SCSI status (like the sg_raw
> example above).
> 
> I still don't understand the command reference counting and freeing
> scheme well enough to know what the right fix is here.  Are we
> supposed to return after that put in the transport_complete==0 case?
> But I thought we weren't supposed to free commands from interrupt
> context, although I don't know what's wrong with doing what ends
> up being just a kmem_cache_free() in the end.  So is doing the put in
> the free_cmd function (which is called from the CTIO completion
> handling interrupt context) OK?
> 
> Why do we have that put if t_transport_complete isn't set, anyway?
> Doesn't the request failure path drop the reference?  Or is the problem
> that we return SCSI status without setting t_transport_complete?
> 
> Thoughts?
> 

I believe this is actually left over cruft from when
qla_tgt_cmd->cmd_stop_free had to be set explicitly set in the failure
release path in tcm_qla2xxx_free_cmd().  Eg:

        if (!atomic_read(&cmd->se_cmd.t_task->t_transport_complete)) {
                atomic_set(&cmd->cmd_stop_free, 1);
                smp_mb__after_atomic_dec();
        }

So the (t_transport_complete == 0) check causing the issue above should
be safe to remove now..  The same is true for the !cmd->se_cmd.se_dev
check in tcm_qla2xxx_free_cmd() as well.

I'll get this addressed in lio-core/qla_tgt-3.3 shortly.

Thanks,

--nab

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html