Possible data damage: 1 pg inconsistent

Frank Ritchie <frankaritchie@xxxxxxxxx> · Tue, 18 Dec 2018 22:48:15 -0500

Hi all,
I have been receiving alerts for:

Possible data damage: 1 pg inconsistent

almost daily for a few weeks now. When I check:

rados list-inconsistent-obj $PG --format=json-pretty

I will always see a read_error. When I run a deep scrub on the PG I will see:

head candidate had a read error

When I check dmesg on the osd node I see:

blk_update_request: critical medium error, dev sdX, sector 123

I will also see a few uncorrected read errors in smartctl.

Info:
Ceph: ceph version 12.2.4-30.el7cp
OSD: Toshiba 1.8TB SAS 10K
120 OSDs total

Has anyone else seen these alerts occur almost daily? Can the errors possibly be due to deep scrubbing too aggressively?

I realize these errors indicate potential failing drives but I can't replace a drive daily.

thx
Frank

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com