On Thu, 21 Apr 2016, Dan van der Ster wrote: > On Thu, Apr 21, 2016 at 1:23 PM, Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote: > > Hi cephalapods, > > > > In our couple years of operating a large Ceph cluster, every single > > inconsistency I can recall was caused by a failed read during > > deep-scrub. In other words, deep scrub reads an object, the read fails > > with dmesg reporting "Sense Key : Medium Error [current]", "Add. > > Sense: Unrecovered read error", "blk_update_request: critical medium > > error", but the ceph-osd keeps on running and serving up data. > > I forgot to mention that the OSD notices the read error. In jewel it prints: > > <objectname>:head got -5 on read, read_error > > So why no assert? I think this should be controlled by a config option, similar to how it is on read (filestore_fail_eio ... although we probably want a more generic option for that, too). The danger would be that if we fail the whole due to a single failed read, we might fail too many osds too quickly, and availability drops. Ideally, if we saw an eio we would do a graceful offload (mark osd out or reweight to 0, drop primary_affinity; and then fail osd when we are done). sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html