On 6/17/19 5:01 PM, Quinn Tran wrote:
Attached is the clean-up patch that we held back from the series. > We felt it wasn't ready for wider audience because it needed additional
soak time with our test group.
We want to ahead and share it with you to let you know that we intent
to cleanup the duplicate atomic [ref_count|kref]. Once it has some
soak time in our test group, we'll submit it in the next RC window.
Hi Quinn,
Thank you for having shared that patch early. My comments about that
patch are as follows:
- The patch description is not correct. Today freeing of an SRB does not
happen when ref_count reaches zero but it happens when the firmware
reports a completion. That is why today the abort code can trigger a
use-after-free. ref_count is only useful today for the abort code to
detect atomically whether or not the firmware already reported that a
request completed.
- Only calling cmd->scsi-done(cmd) when the reference count reaches zero
involves a behavior change. If a command completion and a request to
abort a command race, this patch will report the command as aborted to
the SCSI mid-layer instead of as completed. This change has not been
mentioned in the patch description. Is this change perhaps unintentional?
- I think that this patch does not address the memory leak that can be
triggered by aborting a command. If a command is aborted it will be
freed by qla_release_fcp_cmd_kref() calling qla2xxx_rel_qpair_sp() and
by qla2xxx_rel_qpair_sp() calling sp->free(sp). However, the
implementation of the free function for multiple SRB types is incomplete.
- The approach for avoiding that qla2xxx_eh_abort() triggers a
use-after-free (the new srb_private structure) is interesting. However,
that approach does not look correct to me. The patch attached to the
previous e-mail inserts the following code in qla2xxx_eh_abort():
'sp = CMD_SP(cmd); if (!sp || !sp->qpair || ...) return SUCCESS'
As one can see the sp pointer is dereferenced although the memory it
points at could already have been freed and could have been reallocated
by another driver or another process. So I don't think the new code for
avoiding a use-after-free is correct.
Thanks,
Bart.