On 12/7/15, 6:37 PM, "target-devel-owner@xxxxxxxxxxxxxxx on behalf of Christoph Hellwig" <target-devel-owner@xxxxxxxxxxxxxxx on behalf of hch@xxxxxxxxxxxxx> wrote: >> -void qlt_abort_cmd(struct qla_tgt_cmd *cmd) >> +int qlt_abort_cmd(struct qla_tgt_cmd *cmd) >> { >> struct qla_tgt *tgt = cmd->tgt; >> struct scsi_qla_host *vha = tgt->vha; >> struct se_cmd *se_cmd = &cmd->se_cmd; >> + unsigned long flags,refcount; >> >> ql_dbg(ql_dbg_tgt_mgt, vha, 0xf014, >> "qla_target(%d): terminating exchange for aborted cmd=%p " >> "(se_cmd=%p, tag=%llu)", vha->vp_idx, cmd, &cmd->se_cmd, >> se_cmd->tag); >> >> + spin_lock_irqsave(&cmd->cmd_lock, flags); >> + if (cmd->aborted) { >> + spin_unlock_irqrestore(&cmd->cmd_lock, flags); >> + >> + /* It's normal to see 2 calls in this path: >> + * 1) XFER Rdy completion + CMD_T_ABORT >> + * 2) TCM TMR - drain_state_list >> + */ >> + refcount = atomic_read(&cmd->se_cmd.cmd_kref.refcount); >> + ql_dbg(ql_dbg_tgt_mgt, vha, 0xffff, >> + "multiple abort. %p refcount %lx" >> + "transport_state %x, t_state %x, se_cmd_flags %x \n", >> + cmd, refcount,cmd->se_cmd.transport_state, >> + cmd->se_cmd.t_state,cmd->se_cmd.se_cmd_flags); >> + >> + return EIO; >> + } > >Err, no. Looking into the refcount inside a kref is never the >right thing to do. QT> even for debug purpose?? > >> +typedef enum { >> + /* >> + * BIT_0 - Atio Arrival / schedule to work >> + * BIT_1 - qlt_do_work >> + * BIT_2 - qlt_do work failed >> + * BIT_3 - xfer rdy/tcm_qla2xxx_write_pending >> + * BIT_4 - read respond/tcm_qla2xx_queue_data_in >> + * BIT_5 - status respond / tcm_qla2xx_queue_status >> + * BIT_6 - tcm request to abort/Term exchange. >> + * pre_xmit_response->qlt_send_term_exchange >> + * BIT_7 - SRR received (qlt_handle_srr->qlt_xmit_response) >> + * BIT_8 - SRR received (qlt_handle_srr->qlt_rdy_to_xfer) >> + * BIT_9 - SRR received (qla_handle_srr->qlt_send_term_exchange) >> + * BIT_10 - Data in - hanlde_data->tcm_qla2xxx_handle_data >> + >> + * BIT_12 - good completion - qlt_ctio_do_completion -->free_cmd >> + * BIT_13 - Bad completion - >> + * qlt_ctio_do_completion --> qlt_term_ctio_exchange >> + * BIT_14 - Back end data received/sent. >> + * BIT_15 - SRR prepare ctio >> + * BIT_16 - complete free >> + * BIT_17 - flush - qlt_abort_cmd_on_host_reset >> + * BIT_18 - completion w/abort status >> + * BIT_19 - completion w/unknown status >> + * BIT_20 - tcm_qla2xxx_free_cmd > >Please use descriptive names for these flags in the source code! QT> ACK. We¹ll change the bits to more descriptive name in a ³follow on² patch. > >> + BUG_ON(cmd->cmd_flags & BIT_20); >> + cmd->cmd_flags |= BIT_20; >> + > >And no crazieness like this. While we're at it: what synchronizes >access to ->cmd_flags? QT> These bits provide indication as to where the command has traversed in the QLA code. Each bit is set one time. Due to the async nature of the TMR code, it triggers QLA driver to repeat this specific free path in the double free case. This BUG_ON allows us trap it early on. In one of the corner case (below), I need to overloaded it + lock for the cleanup process. > >> @@ -466,13 +484,25 @@ static int tcm_qla2xxx_handle_cmd(scsi_qla_host_t >>*vha, struct qla_tgt_cmd *cmd, >> static void tcm_qla2xxx_handle_data_work(struct work_struct *work) >> { >> struct qla_tgt_cmd *cmd = container_of(work, struct qla_tgt_cmd, >>work); >> + unsigned long flags; >> >> /* >> * Ensure that the complete FCP WRITE payload has been received. >> * Otherwise return an exception via CHECK_CONDITION status. >> */ >> cmd->cmd_in_wq = 0; >> - cmd->cmd_flags |= BIT_11; >> + >> + spin_lock_irqsave(&cmd->cmd_lock, flags); >> + cmd->cmd_flags |= CMD_FLAG_DATA_WORK; >> + if (cmd->aborted) { >> + cmd->cmd_flags |= CMD_FLAG_DATA_WORK_FREE; >> + spin_unlock_irqrestore(&cmd->cmd_lock, flags); >> + >> + tcm_qla2xxx_free_cmd(cmd); >> + return; >> + } >> + spin_unlock_irqrestore(&cmd->cmd_lock, flags); > >All these abort flag hacks look very suspicios. Can you explain the >exact theory of operation behind them? QT> The cmd->aborted flag is used to track the CMD_T_ABORT flag at TCM level. If the command have been requested to be aborted by TCM or already aborted, we advance it to the ³free" state because our hardware have already started freeing up resources associated to this command/exchange. In this specific case(above), a XFER RDY was aborted by the TMR. Returning the cmd to TCM to generate SCSI Status would generate erroneous HW error due to freed resource. > >-- >To unsubscribe from this list: send the line "unsubscribe target-devel" in >the body of a message to majordomo@xxxxxxxxxxxxxxx >More majordomo info at http://vger.kernel.org/majordomo-info.html
<<attachment: winmail.dat>>