Re: [EXT] Re: [PATCH 3/3] qla2xxx: Fix NVME cmd and LS cmd timeout race condition

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/18/19, 8:57 AM, "Bart Van Assche" <bvanassche@xxxxxxx> wrote:

    On 6/17/19 5:01 PM, Quinn Tran wrote:
    > Attached is the clean-up patch that we held back from the series. > We felt it wasn't ready for wider audience because it needed additional
    > soak time with our test group.
    > 
    > We want to ahead and share it with you to let you know that we intent
    > to cleanup the duplicate atomic [ref_count|kref].  Once it has some
    > soak time in our test group, we'll submit it in the next RC window.
    
    Hi Quinn,
    
    Thank you for having shared that patch early. My comments about that 
    patch are as follows:
    - The patch description is not correct. Today freeing of an SRB does not 
    happen when ref_count reaches zero but it happens when the firmware 
    reports a completion. That is why today the abort code can trigger a 
    use-after-free. ref_count is only useful today for the abort code to 
    detect atomically whether or not the firmware already reported that a 
    request completed.

QT: Bart, thanks for the additional eyes.  This patch made additional safe guard to prevent use after free.  When the original IO arrive into QLA driver, QLA will bind scsi_cmnd & srb together.  When the kref reaches 0, we will unbind the 2 structures with NULL pointer.  Any attempt on 'use after free' will be block by null pointer.  We reserve additional scratch space at the end of scsi_cmnd (srb_private) to facilitate the bind and unbind.

879 qla_release_fcp_cmd_kref(struct kref *kref)
880 {
 881         struct srb *sp = container_of(kref, struct srb, cmd_kref);
 882         struct scsi_cmnd *cmd = GET_CMD_SP(sp);
 884         struct srb_private *priv = (struct srb_private*)(cmd + 1);
..
 888         spin_lock_irqsave(&priv->cmd_lock, flags);
 889         CMD_SP(cmd) = NULL;    <<<< unbind scsi_cmnd from srb.
 890         sp->u.scmd.cmd = NULL;
 891         spin_unlock_irqrestore(&priv->cmd_lock, flags);
}

1359 static int
1360 qla2xxx_eh_abort(struct scsi_cmnd *cmd)
1361 {
..
1370         struct srb_private *priv = (struct srb_private*)(cmd + 1);
..
1382 
1383         spin_lock_irqsave(&priv->cmd_lock, flags);
1384         sp = (srb_t *) CMD_SP(cmd);
1385         if (!sp || !sp->qpair ||
1386             (priv->cmd_id != sp->cmd_id) ||
1387             (sp->fcport && sp->fcport->deleted)) {
1388                 spin_unlock_irqrestore(&priv->cmd_lock, flags);
1389                 return SUCCESS;
1390         }
  


    - Only calling cmd->scsi-done(cmd) when the reference count reaches zero 
    involves a behavior change. If a command completion and a request to 
    abort a command race, this patch will report the command as aborted to 
    the SCSI mid-layer instead of as completed. This change has not been 
    mentioned in the patch description. Is this change perhaps unintentional?
QT:  I believe the original code has the same behavior.  We're trying to preserve precedent.  Will revisit to differentiate the 2 status code the described race.

    - I think that this patch does not address the memory leak that can be 
    triggered by aborting a command. If a command is aborted it will be 
    freed by qla_release_fcp_cmd_kref() calling qla2xxx_rel_qpair_sp() and 
    by qla2xxx_rel_qpair_sp() calling sp->free(sp). However, the 
    implementation of the free function for multiple SRB types is incomplete.

QT:  As for the other type SRB, I follow what you're pointing at.  We do have another patch(is) to address this other corner.  Will queue it up to go out also with the next window. It's currently being soak in our test group.

    - The approach for avoiding that qla2xxx_eh_abort() triggers a 
    use-after-free (the new srb_private structure) is interesting. However, 
    that approach does not look correct to me. The patch attached to the 
    previous e-mail inserts the following code in qla2xxx_eh_abort():
       'sp = CMD_SP(cmd); if (!sp || !sp->qpair || ...) return SUCCESS'
    As one can see the sp pointer is dereferenced although the memory it 
    points at could already have been freed and could have been reallocated 
    by another driver or another process. So I don't think the new code for 
    avoiding a use-after-free is correct.

QT:  OK. Will remove old defensive check that may still cause use after free.

----
    
    Thanks,
    
    Bart.
    





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux