Re: [PATCH v4 1/3] cxlflash: Base error recovery support

"Matthew R. Ochs" <mrochs@xxxxxxxxxxxxxxxxxx> · Tue, 11 Aug 2015 16:45:10 -0500

> On Aug 10, 2015, at 10:41 PM, Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx> wrote:
> 
> On Mon, 2015-08-10 at 12:09 -0500, Matthew R. Ochs wrote:
>> Introduce support for enhanced I/O error handling.
>> 
>> Signed-off-by: Matthew R. Ochs <mrochs@xxxxxxxxxxxxxxxxxx>
>> Signed-off-by: Manoj N. Kumar <manoj@xxxxxxxxxxxxxxxxxx>
>> ---
> 
> So I'm not necessarily very qualified to review SCSI bits as I haven't
> done anything close to the Linux SCSI code for almost a decade but I
> have a couple of questions and nits…

Thanks for reviewing Ben!

> 
>> 	wait_queue_head_t tmf_waitq;
>> 	bool tmf_active;
>> -	u8 err_recovery_active:1;
>> +	wait_queue_head_t limbo_waitq;
>> +	enum cxlflash_state state;
>> };
>> 
>> struct afu_cmd {
>> diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
>> index 76a7286..18359d4 100644
>> --- a/drivers/scsi/cxlflash/main.c
>> +++ b/drivers/scsi/cxlflash/main.c
>> @@ -380,6 +380,18 @@ static int cxlflash_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scp)
>> 	}
>> 	spin_unlock_irqrestore(&cfg->tmf_waitq.lock, lock_flags);
>> 
>> +	switch (cfg->state) {
>> +	case STATE_LIMBO:
>> +		pr_debug_ratelimited("%s: device in limbo!\n", __func__);
>> +		rc = SCSI_MLQUEUE_HOST_BUSY;
>> +		goto out;
>> +	case STATE_FAILTERM:
>> +		pr_debug_ratelimited("%s: device has failed!\n", __func__);
>> +		goto error;
>> +	default:
>> +		break;
>> +	}
> 
> I noticed that you mostly read and write that new state outside of your
> tmf_waitq.lock. Is there any other lock or mutex protecting you ? In
> this specific case ?

No there isn’t.

> 
> I know in the old day queuecommand was called with a host lock, is that
> still the case ?

No, it’s no longer the case.

> Or you just don't care about an occasional spurrious
> SCSI_MLQUEUE_HOST_BUSY ?

This is pretty much true. =)

> 
>> 	cmd = cxlflash_cmd_checkout(afu);
>> 	if (unlikely(!cmd)) {
>> 		pr_err("%s: could not get a free command\n", __func__);
>> @@ -428,6 +440,10 @@ static int cxlflash_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scp)
>> 
>> out:
>> 	return rc;
>> +error:
>> +	scp->result = (DID_NO_CONNECT << 16);
>> +	scp->scsi_done(scp);
>> +	return 0;
>> }
>> 
>> /**
>> @@ -455,9 +471,21 @@ static int cxlflash_eh_device_reset_handler(struct scsi_cmnd *scp)
>> 		 get_unaligned_be32(&((u32 *)scp->cmnd)[2]),
>> 		 get_unaligned_be32(&((u32 *)scp->cmnd)[3]));
>> 
>> -	rcr = send_tmf(afu, scp, TMF_LUN_RESET);
>> -	if (unlikely(rcr))
>> +	switch (cfg->state) {
>> +	case STATE_NORMAL:
>> +		rcr = send_tmf(afu, scp, TMF_LUN_RESET);
>> +		if (unlikely(rcr))
>> +			rc = FAILED;
>> +		break;
>> +	case STATE_LIMBO:
>> +		wait_event(cfg->limbo_waitq, cfg->state != STATE_LIMBO);
>> +		if (cfg->state == STATE_NORMAL)
>> +			break;
>> +		/* fall through */
>> +	default:
>> 		rc = FAILED;
>> +		break;
>> +	}
> 
> Same here, since you are doing wait_event, I assume no lock is held
> (unless it's a mutex) and in subsequent places I am not listing.
> 
> As I said, it could well be that it all works fine but it would be worth
> mentioning in this case because it's not obvious from simply reading the
> code.

It has been working well, and yes, your assumption is correct. We can look at
adding more/better serialization as a future enhancement.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html