Hi cephalapods, In our couple years of operating a large Ceph cluster, every single inconsistency I can recall was caused by a failed read during deep-scrub. In other words, deep scrub reads an object, the read fails with dmesg reporting "Sense Key : Medium Error [current]", "Add. Sense: Unrecovered read error", "blk_update_request: critical medium error", but the ceph-osd keeps on running and serving up data. The incorrect solution to these inconsistencies would be to repair the PG -- in every case a subsequent smart long test shows that the drive is indeed failing. Instead, the correct solution is to stop the OSD, let Ceph backfill, then deep-scrub the affected PG. So I'm curious, why doesn't the OSD exit FAILED when a read fails during deep scrub (or any time a read fails)? Failed writes certainly cause the OSD to exit -- why not reads? Best Regards, Dan van der Ster CERN IT _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com