Re: OSD behaviour when an i/o error occurs

Sage Weil <sage@xxxxxxxxxxxx> · Wed, 6 Sep 2017 16:44:14 +0000 (UTC)

On Wed, 6 Sep 2017, Vincent Godin wrote:
> Hello,
> 
> I'd like to understand the behaviour of an OSD daemon when an I/O
> error occurs while reading and while writing.
> We had some I/O errors while reading during deep-scrub on one OSD and
> it's lead to hold all client's requests
> Ceph version : Jewel 10.2.6
> faulty OSD is a raid 0 on one SATA Disk on a HP SL4540 host
> 
> Is there a normal process of handling a I/O error by Ceph ? Is this
> probleme linked to my hardware config. The corrupted sector seems not
> take in account by the hardware so the error can re-occur a lot of
> time on the same sector (maybe a problem with the raid0 between ceph
> and the disk)
> 
> In the dmesg of the host, we can see the error :
> 
> sd 0:1:0:22: [sdw] tag#22 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> sd 0:1:0:22: [sdw] tag#22 Sense Key : Medium Error [current]
> sd 0:1:0:22: [sdw] tag#22 Add. Sense: Unrecovered read error
> sd 0:1:0:22: [sdw] tag#22 CDB: Read(16) 88 00 00 00 00 00 2e 15 24 e0
> 00 00 01 00 00 00
> blk_update_request: critical medium error, dev sdw, sector 773137632
> hpsa 0000:08:00.0: scsi 0:1:0:22: resetting logical Direct-Access HP
> LOGICAL VOLUME RAID-0 SSDSmartPathCap- En- Exp=1
> hpsa 0000:08:00.0: scsi 0:1:0:22: reset logical completed successfully
> Direct-Access HP LOGICAL VOLUME RAID-0 SSDSmartPathCap- En- Exp=1
> sd 0:1:0:22: [sdw] tag#10 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> sd 0:1:0:22: [sdw] tag#10 Sense Key : Medium Error [current]
> sd 0:1:0:22: [sdw] tag#10 Add. Sense: Unrecovered read error
> sd 0:1:0:22: [sdw] tag#10 CDB: Read(16) 88 00 00 00 00 01 e1 39 3c 00
> 00 00 01 00 00 00
> blk_update_request: critical medium error, dev sdw, sector 8073591808
> 
> In the OSD log (with a standard level of logging), we can only see the
> number of slow requests raising (before the system alarm) and a lot of
> timeout of osd_op_tp thread then OSD is marked down by the others.
> Nothing on the failed I/O

It sounds like if the IO had returned sooner with an error (before we hit 
the internal timeout) then we would have tried to do something, but in 
this case we didn't survive long enough to get there.  During scrub we 
note EIO and mark the PG as needing repair, and during read operations we 
try to recover from another replica or EC shards.

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html