Re: [PATCH 3/3] scsi: handle zone resources errors

Damien Le Moal <Damien.LeMoal@xxxxxxx> · Thu, 10 Sep 2020 22:16:10 +0000

On 2020/09/11 2:54, Christoph Hellwig wrote:
> On Thu, Sep 10, 2020 at 04:39:52PM +0900, Damien Le Moal wrote:
>> +		case DATA_PROTECT:
>> +			sdev_printk(KERN_INFO, cmd->device,
>> +				    "asc/ascq = 0x%02x 0x%02x\n",
>> +				    sshdr.asc, sshdr.ascq);
>> +			action = ACTION_FAIL;
>> +			if ((sshdr.asc == 0x0C && sshdr.ascq == 0x12) ||
>> +			    (sshdr.asc == 0x55 &&
>> +			     (sshdr.ascq == 0x0E || sshdr.ascq == 0x0F))) {
>> +				/* Insufficient zone resources */
>> +				blk_stat = BLK_STS_DEV_RESOURCE;
> 
> BLK_STS_DEV_RESOURCE is a magic error code leading to a retry on the
> particular request_queue once it isn't busy any more.  Please don't
> abuse it for random other conditions.

Yes, but that is for the submission path, isn't it ? This change is in the
completion path and action is set to ACTION_FAIL, so the request is terminated
right away without any retry (tested). More importantly, this leads to the block
layer returning -EBUSY which allows the user to differentiate this
temporary/trivial error from the potentially more serious -EIO.

Keith sent a patch for NVMe ZNS doing something similar, which will result in
the block layer returning -EBUSY for zone resource errors. I would like to unify
scsi and nvme behavior for these recoverable zone resource errors.

So should we define a new BLK_STS_BUSYERR status to differentiate from the
default BLK_STS_IOERR and not overload BLK_STS_DEV_RESOURCE (or
BLK_STS_ZONE_RESOURCE) ?

-- 
Damien Le Moal
Western Digital Research