On Mon, May 2, 2016 at 3:44 PM, James Bottomley <jejb@xxxxxxxxxxxxxxxxxx> wrote: > On Mon, 2016-05-02 at 12:05 +0200, Hannes Reinecke wrote: >> On 04/29/2016 02:49 PM, Jinpu Wang wrote: >> > Hi, all >> > >> > We hit IO error on fsync, it turns out was because sd treat >> > succeeded >> > SYNC as error. From what I checked in SBC spec there is no >> > indication >> > we should fail IO in this case, so we create this patch. >> > >> > >> > Best Regards, >> > >> > Jack Wang >> > >> > v2: >> > No change on patch itself, only resend in body as suggested by >> > Bart, >> > still keep the attachment in case mail client break the format. >> > >> > From 5d1f72d9643ce61cd9f3d312377378c43f171d0c Mon Sep 17 00:00:00 >> > 2001 >> > From: Jack Wang <jinpu.wang@xxxxxxxxxxxxxxxx> >> > Date: Mon, 25 Apr 2016 12:05:22 +0200 >> > Subject: [PATCH] sd: Don't treat succeeded SYNC as error >> > >> > We hit IO error in our production on multipath devices during >> > resize >> > device on target side, the problem turns out sd driver passes up as >> > IO >> > error when sense data is UNIT_ATTENTION and ASC && ASCQ indicate >> > Capacity data has changed, even storage side sync the data >> > properly. >> > >> > In order to fix this check in sd_done, report success if condition >> > matches. >> > >> > Sebastian Parschauer report/analyze the bug here: >> > https://sourceforge.net/p/scst/mailman/message/34953416/ >> > >> > Signed-off-by: Sebastian Parschauer <s.parschauer@xxxxxx> >> > Signed-off-by: Jack Wang <jinpu.wang@xxxxxxxxxxxxxxxx> >> > --- >> > drivers/scsi/sd.c | 13 +++++++++++++ >> > 1 file changed, 13 insertions(+) >> > >> Well. >> Is there anything which guarantees us that 'capacity data has >> changed' will be the only sense code which we'll be seeing as a >> response to SYNCHRONIZE CACHE? >> I sincerely doubt so. >> So why don't you fall back to the default action (ie retry the >> command) whenever you hit an UNIT ATTENTION? >> This way we would cove any resulting sense code, _and_ would get rid >> of the rather ugly special case here. > > Actually, why are we getting here at all? should we be eating this > unit attention once we've reported it in scsi_check_sense()? > > I also don't quite understand why the normal retry mechanism in > scsi_io_completion() (called after drv->done()) isn't handling this. > We set retries on a flush command and we give sd_sync_cache three > goes. Any one of those should also cause the CC/UA to be ignored. > > James > > Sorry for delay, I agree safer to retry this command. I checked the code path, in scsi_io_completion, we call __scsi_error_from_host_byte for FLUSH request, and we set error to EIO by default, somehow the code report error directly to user space without retry. [ 647.206270] sd 1:0:0:0: [sdb] tag#0 Send: scmd 0xffff8800b6558480 [ 647.206422] sd 1:0:0:0: [sdb] tag#0 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00 [ 647.209748] sd 1:0:0:0: Capacity data has changed [ 647.209896] sd 1:0:0:0: [sdb] tag#0 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK [ 647.210157] sd 1:0:0:0: [sdb] tag#0 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00 [ 647.210419] sd 1:0:0:0: [sdb] tag#0 Sense Key : Unit Attention [current] [ 647.210567] sd 1:0:0:0: [sdb] tag#0 Add. Sense: Capacity data has changed [ 647.210741] sd 1:0:0:0: [sdb] tag#0 scsi host busy 1 failed 0 [ 647.210888] sd 1:0:0:0: Notifying upper driver of completion (result 8000002) [ 647.211035] sd 1:0:0:0: [sdb] tag#0 sd_done: completed 0 of 0 bytes [ 647.211182] sd 1:0:0:0: [sdb] tag#0 0 sectors total, 0 bytes done, error -5 [ 647.211330] blk_update_request: I/O error, dev sdb, sector 0 Will figure out why retry doesn't work. Thanks James and Hannes for all your input. Regards, Jack -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html