Hi Frank,
I encounter exactly the same issue with
the same disks than yours. Every day, after a batch of deep
scrubbing operation, ther are generally between 1 and 3
inconsistent pgs, and that, on different OSDs.
It could confirm a problem on these
disks, but :
- it concerns only the pgs of the rbd
pool, not those of cephfs pools (the same disk model is used)
- I encounter this when I was running
12.2.5, not when I upgraded in 12.2.8 but the problem appears
again after upgrade in 12.2.10
- On my side, smartctl and dmesg do not
show any media error, so I'm pretty sure that physical media is
not concerned...
Small precision: each disk is
configured with RAID0 on a PERC740P, is this also the case for you
or are your disks in JBOD mode ?
Another question: in your case, the OSD
who is involved in the inconsistent pgs is it always the same one
or is it a new one every time ?
For information, currently, the
manually 'ceph pg repair' command works well each time...
Context: Luminous 12.2.10, Bluestore
OSD with data block on SATA disks and WAL/DB on NVMe, rbd
configuration replica 3/2
Cheers,
rv Few outputs:
$ sudo ceph -s
cluster: id: 838506b7-e0c6-4022-9e17-2d1cf9458be6 health: HEALTH_ERR 3 scrub errors Possible data damage: 3 pgs inconsistent services: mon: 3 daemons, quorum inf-ceph-mon0,inf-ceph-mon1,inf-ceph-mon2 mgr: inf-ceph-mon0(active), standbys: inf-ceph-mon1, inf-ceph-mon2 mds: cephfs_home-2/2/2 up {0=inf-ceph-mon1=up:active,1=inf-ceph-mon0=up:active}, 1 up:standby osd: 126 osds: 126 up, 126 in data: pools: 3 pools, 4224 pgs objects: 23.35M objects, 20.9TiB usage: 64.9TiB used, 136TiB / 201TiB avail pgs: 4221 active+clean 3 active+clean+inconsistent io: client: 2.62KiB/s rd, 2.25MiB/s wr, 0op/s rd, 118op/s wr $ sudo ceph health detail HEALTH_ERR 3 scrub errors; Possible data damage: 3 pgs inconsistent OSD_SCRUB_ERRORS 3 scrub errors PG_DAMAGED Possible data damage: 3 pgs inconsistent pg 9.27 is active+clean+inconsistent, acting [78,107,96] pg 9.260 is active+clean+inconsistent, acting [84,113,62] pg 9.6b9 is active+clean+inconsistent, acting [79,107,80] $ sudo rados list-inconsistent-obj 9.27
--format=json-prettyrados list-inconsistent-obj 9.27
--format=json-pretty |grep error
"errors": [], "union_shard_errors": [ "read_error" "errors": [ "read_error" "errors": [], "errors": [], $ sudo rados list-inconsistent-obj 9.260 --format=json-prettyrados list-inconsistent-obj 9.260 --format=json-pretty |grep error "errors": [], "union_shard_errors": [ "read_error" "errors": [], "errors": [], "errors": [ "read_error" $ sudo rados list-inconsistent-obj 9.6b9 --format=json-prettyrados list-inconsistent-obj 9.6b9 --format=json-pretty |grep error "errors": [], "union_shard_errors": [ "read_error" "errors": [ "read_error" "errors": [], "errors": [], $ sudo ceph pg repair 9.27 instructing pg 9.27 on osd.78 to repair $ sudo ceph pg repair 9.260 instructing pg 9.260 on osd.84 to repair $ sudo ceph pg repair 9.6b9 instructing pg 9.6b9 on osd.79 to repair $ sudo ceph -s cluster: id: 838506b7-e0c6-4022-9e17-2d1cf9458be6 health: HEALTH_OK services: mon: 3 daemons, quorum inf-ceph-mon0,inf-ceph-mon1,inf-ceph-mon2 mgr: inf-ceph-mon0(active), standbys: inf-ceph-mon1, inf-ceph-mon2 mds: cephfs_home-2/2/2 up {0=inf-ceph-mon1=up:active,1=inf-ceph-mon0=up:active}, 1 up:standby osd: 126 osds: 126 up, 126 in data: pools: 3 pools, 4224 pgs objects: 23.35M objects, 20.9TiB usage: 64.9TiB used, 136TiB / 201TiB avail pgs: 4224 active+clean io: client: 195KiB/s rd, 7.19MiB/s wr, 17op/s rd, 127op/s wr Le 19/12/2018 à 04:48, Frank Ritchie a
écrit :
|
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com