Re: [PATCHv2]sd: Don't treat succeeded SYNC as error

Jinpu Wang <jinpu.wang@xxxxxxxxxxxxxxxx> · Wed, 4 May 2016 19:02:53 +0200

On Mon, May 2, 2016 at 3:44 PM, James Bottomley <jejb@xxxxxxxxxxxxxxxxxx> wrote:
> On Mon, 2016-05-02 at 12:05 +0200, Hannes Reinecke wrote:
>> On 04/29/2016 02:49 PM, Jinpu Wang wrote:
>> > Hi, all
>> >
>> > We hit IO error on fsync, it turns out was because sd treat
>> > succeeded
>> > SYNC as error. From what I checked in SBC spec there is no
>> > indication
>> > we should fail IO in this case, so we create this patch.
>> >
>> >
>> > Best Regards,
>> >
>> > Jack Wang
>> >
>> > v2:
>> > No change on patch itself, only resend in body as suggested by
>> > Bart,
>> > still keep the attachment in case mail client break the format.
>> >
>> > From 5d1f72d9643ce61cd9f3d312377378c43f171d0c Mon Sep 17 00:00:00
>> > 2001
>> > From: Jack Wang <jinpu.wang@xxxxxxxxxxxxxxxx>
>> > Date: Mon, 25 Apr 2016 12:05:22 +0200
>> > Subject: [PATCH] sd: Don't treat succeeded SYNC as error
>> >
>> > We hit IO error in our production on multipath devices during
>> > resize
>> > device on target side, the problem turns out sd driver passes up as
>> > IO
>> > error when sense data is UNIT_ATTENTION and ASC && ASCQ indicate
>> > Capacity data has changed, even storage side sync the data
>> > properly.
>> >
>> > In order to fix this check in sd_done, report success if condition
>> > matches.
>> >
>> > Sebastian Parschauer report/analyze the bug here:
>> > https://sourceforge.net/p/scst/mailman/message/34953416/
>> >
>> > Signed-off-by: Sebastian Parschauer <s.parschauer@xxxxxx>
>> > Signed-off-by: Jack Wang <jinpu.wang@xxxxxxxxxxxxxxxx>
>> > ---
>> >  drivers/scsi/sd.c | 13 +++++++++++++
>> >  1 file changed, 13 insertions(+)
>> >
>> Well.
>> Is there anything which guarantees us that 'capacity data has
>> changed' will be the only sense code which we'll be seeing as a
>> response to SYNCHRONIZE CACHE?
>> I sincerely doubt so.
>> So why don't you fall back to the default action (ie retry the
>> command) whenever you hit an UNIT ATTENTION?
>> This way we would cove any resulting sense code, _and_ would get rid
>> of the rather ugly special case here.
>
> Actually, why are we getting here at all?  should we be eating this
> unit attention once we've reported it in scsi_check_sense()?
>
> I also don't quite understand why the normal retry mechanism in
> scsi_io_completion() (called after drv->done()) isn't handling this.
>  We set retries on a flush command and we give sd_sync_cache three
> goes.  Any one of those should also cause the CC/UA to be ignored.
>
> James
>
>

Sorry for delay, I agree safer to retry this command.
I checked the code path, in scsi_io_completion, we call
__scsi_error_from_host_byte for FLUSH request,
and we set error to EIO by default, somehow the code report error
directly to user space without retry.
[  647.206270] sd 1:0:0:0: [sdb] tag#0 Send: scmd 0xffff8800b6558480
[  647.206422] sd 1:0:0:0: [sdb] tag#0 CDB: Synchronize Cache(10) 35
00 00 00 00 00 00 00 00 00
[  647.209748] sd 1:0:0:0: Capacity data has changed
[  647.209896] sd 1:0:0:0: [sdb] tag#0 Done: SUCCESS Result:
hostbyte=DID_OK driverbyte=DRIVER_OK
[  647.210157] sd 1:0:0:0: [sdb] tag#0 CDB: Synchronize Cache(10) 35
00 00 00 00 00 00 00 00 00
[  647.210419] sd 1:0:0:0: [sdb] tag#0 Sense Key : Unit Attention [current]
[  647.210567] sd 1:0:0:0: [sdb] tag#0 Add. Sense: Capacity data has changed
[  647.210741] sd 1:0:0:0: [sdb] tag#0 scsi host busy 1 failed 0
[  647.210888] sd 1:0:0:0: Notifying upper driver of completion (result 8000002)
[  647.211035] sd 1:0:0:0: [sdb] tag#0 sd_done: completed 0 of 0 bytes
[  647.211182] sd 1:0:0:0: [sdb] tag#0 0 sectors total, 0 bytes done, error -5
[  647.211330] blk_update_request: I/O error, dev sdb, sector 0

Will figure out why retry doesn't work.

Thanks James and Hannes for all your input.

Regards,
Jack
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html