On 16/02/2016 15:31, Hannes Reinecke wrote:
On 02/16/2016 01:22 PM, John Garry wrote:
When TRANS_TX_CREDIT_TIMEOUT_ERR or
TRANS_TX_CLOSE_NORMAL_ERR errors occur for a
command, the command should be re-attempted.
Signed-off-by: John Garry <john.garry@xxxxxxxxxx>
---
drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 22 ++++++++++++++++++----
1 file changed, 18 insertions(+), 4 deletions(-)
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
index ce5f65d..34f71a1c 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
@@ -1118,9 +1118,8 @@ static int prep_ssp_v1_hw(struct hisi_hba *hisi_hba,
}
/* by default, task resp is complete */
-static void slot_err_v1_hw(struct hisi_hba *hisi_hba,
- struct sas_task *task,
- struct hisi_sas_slot *slot)
+static void slot_err_v1_hw(struct hisi_hba *hisi_hba, struct sas_task *task,
+ struct hisi_sas_slot *slot, int *abort_slot)
{
struct task_status_struct *ts = &task->task_status;
struct hisi_sas_err_record_v1 *err_record = slot->status_buffer;
@@ -1212,6 +1211,14 @@ static void slot_err_v1_hw(struct hisi_hba *hisi_hba,
ts->stat = SAS_NAK_R_ERR;
break;
}
+ case TRANS_TX_CREDIT_TIMEOUT_ERR:
+ case TRANS_TX_CLOSE_NORMAL_ERR:
+ {
+ /* This will request a retry */
+ ts->stat = SAS_QUEUE_FULL;
+ ++(*abort_slot);
+ break;
+ }
default:
{
ts->stat = SAM_STAT_CHECK_CONDITION;
@@ -1317,8 +1324,14 @@ static int slot_complete_v1_hw(struct hisi_hba *hisi_hba,
if (cmplt_hdr_data & CMPLT_HDR_ERR_RCRD_XFRD_MSK &&
!(cmplt_hdr_data & CMPLT_HDR_RSPNS_XFRD_MSK)) {
+ int abort_slot = 0;
- slot_err_v1_hw(hisi_hba, task, slot);
+ slot_err_v1_hw(hisi_hba, task, slot, &abort_slot);
+ if (unlikely(abort_slot)) {
+ queue_work(hisi_hba->wq, &slot->abort_slot);
+ sts = ts->stat;
+ goto out_1;
+ }
goto out;
}
What is the 'abort_slot' variable for?
Currently it's just a counter, no?
So why the weird pointer passing?
And it does feel weird. Apparently the driver does get a message,
but still has to abort the command. Why?
Isn't the message an indicator that the command has been aborted?
Cheers,
Hannes
I'll paste some more code for convenience and to help clarify:
static int slot_complete_v1_hw(struct hisi_hba *hisi_hba,
struct hisi_sas_slot *slot, int abort)
{
...
if (cmplt_hdr_data & CMPLT_HDR_ERR_RCRD_XFRD_MSK &&
!(cmplt_hdr_data & CMPLT_HDR_RSPNS_XFRD_MSK)) {
int abort_slot = 0;
slot_err_v1_hw(hisi_hba, task, slot, &abort_slot);
if (unlikely(abort_slot)) { /* check if we need to abort the
task */
queue_work(hisi_hba->wq, &slot->abort_slot);
sts = ts->stat;
goto out_1;
}
goto out;
}
...
out:
if (sas_dev && sas_dev->running_req)
sas_dev->running_req--;
hisi_sas_slot_task_free(hisi_hba, task, slot);
sts = ts->stat;
if (task->task_done)
task->task_done(task);
out_1:
return sts;
}
Variable abort_slot is really a boolean flag which can be set in
slot_err_v1_hw(). When error TRANS_TX_CREDIT_TIMEOUT_ERR or
TRANS_TX_CLOSE_NORMAL_ERR occurs in the slot, abort_slot is set. In this
case we don't immediately complete the task (goto out and call
hisi_sas_slot_task_free() and task->task_done()), but instead queue the
task to be aborted in the device before completing (call queue_work()
and then goto out_1).
When hisi_sas_slot_abort() [patch #2] runs in the workqueue for the
task, it first aborts the task in the device with a TMF, and then
completes the task. Finally the status (SAS_QUEUE_FULL) is passed back
to SCSI framework, which will request a retry for the scsi command.
This is the method our hw people recommended to handle these types of
errors.
Hope this explains,
Cheers,
John
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html