Update to this.
The affected pg didn't seem inconsistent:
[root@admin-ceph1-qh2 ~]# ceph health detail
HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent
OSD_SCRUB_ERRORS 1 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent
pg 6.20 is active+clean+inconsistent, acting [114,26,44]
[root@admin-ceph1-qh2 ~]# rados list-inconsistent-obj 6.20 --format=json-pretty
{
"epoch": 210034,
"inconsistents": []
}
HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent
OSD_SCRUB_ERRORS 1 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent
pg 6.20 is active+clean+inconsistent, acting [114,26,44]
[root@admin-ceph1-qh2 ~]# rados list-inconsistent-obj 6.20 --format=json-pretty
{
"epoch": 210034,
"inconsistents": []
}
Although pg query showed the primary info.stats.stat_sum.num_bytes differed from the peers
A pg repair on 6.20 seems to have resolved the issue for now but the info.stats.stat_sum.num_bytes still differs so presumably will become inconsistent again next time it scrubs.
Adrian.
On Tue, Jun 5, 2018 at 12:09 PM, Adrian <aussieade@xxxxxxxxx> wrote:
Hi Cephers,We recently upgraded one of our clusters from hammer to jewel and then to luminous (12.2.5, 5 mons/mgr, 21 storage nodes * 9 osd's). After some deep-scubs we have an inconsistent pg with a log message we've not seen before:HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent
OSD_SCRUB_ERRORS 1 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent
pg 6.20 is active+clean+inconsistent, acting [114,26,44]Ceph log shows2018-06-03 06:53:35.467791 osd.114 osd.114 172.26.28.25:6825/40819 395 : cluster [ERR] 6.20 scrub stat mismatch, got 6526/6526 objects, 87/87 clones, 6526/6526 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts, 25952454144/25952462336 bytes, 0/0 hit_set_archive bytes. 2018-06-03 06:53:35.467799 osd.114 osd.114 172.26.28.25:6825/40819 396 : cluster [ERR] 6.20 scrub 1 errors 2018-06-03 06:53:40.701632 mon.mon1-ceph1-qh2 mon.0 172.26.28.8:6789/0 41298 : cluster [ERR] Health check failed: 1 scrub errors (OSD_SCRUB_ERRORS) 2018-06-03 06:53:40.701668 mon.mon1-ceph1-qh2 mon.0 172.26.28.8:6789/0 41299 : cluster [ERR] Health check failed: Possible data damage: 1 pg inconsistent (PG_DAMAGED) 2018-06-03 07:00:00.000137 mon.mon1-ceph1-qh2 mon.0 172.26.28.8:6789/0 41345 : cluster [ERR] overall HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistentThere are no EC pools - looks like it may be the same as https://tracker.ceph.com/issues/22656 although as in #7 this is not a cache pool.Wondering if this is ok to issue a pg repair on 6.20 or if there's something else we should be looking at first ?Thanks in advance,Adrian.---
Adrian : aussieade@xxxxxxxxx
If violence doesn't solve your problem, you're not using enough of it.
--
---
Adrian : aussieade@xxxxxxxxx
If violence doesn't solve your problem, you're not using enough of it.
Adrian : aussieade@xxxxxxxxx
If violence doesn't solve your problem, you're not using enough of it.
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com