inconsistencies from read errors during scrub

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi cephalapods,

In our couple years of operating a large Ceph cluster, every single
inconsistency I can recall was caused by a failed read during
deep-scrub. In other words, deep scrub reads an object, the read fails
with dmesg reporting "Sense Key : Medium Error [current]", "Add.
Sense: Unrecovered read error", "blk_update_request: critical medium
error", but the ceph-osd keeps on running and serving up data.

The incorrect solution to these inconsistencies would be to repair the
PG -- in every case a subsequent smart long test shows that the drive
is indeed failing.

Instead, the correct solution is to stop the OSD, let Ceph backfill,
then deep-scrub the affected PG.


So I'm curious, why doesn't the OSD exit FAILED when a read fails
during deep scrub (or any time a read fails)? Failed writes certainly
cause the OSD to exit -- why not reads?

Best Regards,
Dan van der Ster
CERN IT
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux