Re: [ceph-users] inconsistencies from read errors during scrub

Sage Weil <sage@xxxxxxxxxxxx> · Tue, 10 May 2016 12:10:24 -0400 (EDT)

On Thu, 21 Apr 2016, Dan van der Ster wrote:
> On Thu, Apr 21, 2016 at 1:23 PM, Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
> > Hi cephalapods,
> >
> > In our couple years of operating a large Ceph cluster, every single
> > inconsistency I can recall was caused by a failed read during
> > deep-scrub. In other words, deep scrub reads an object, the read fails
> > with dmesg reporting "Sense Key : Medium Error [current]", "Add.
> > Sense: Unrecovered read error", "blk_update_request: critical medium
> > error", but the ceph-osd keeps on running and serving up data.
> 
> I forgot to mention that the OSD notices the read error. In jewel it prints:
> 
> <objectname>:head got -5 on read, read_error
> 
> So why no assert?

I think this should be controlled by a config option, similar to how it is 
on read (filestore_fail_eio ... although we probably want a more generic 
option for that, too).

The danger would be that if we fail the whole due to a single failed read, 
we might fail too many osds too quickly, and availability drops.  
Ideally, if we saw an eio we would do a graceful offload (mark osd out or 
reweight to 0, drop primary_affinity; and then fail osd when we are done).

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html