Hi, Our clusters were upgraded to v10.2.9, from ~v10.2.7 (actually a local git snapshot that was not quite 10.2.7), and since then, we're seeing a LOT more scrub errors than previously. The digest logging on the scrub errors, in some cases, is also now maddeningly short: it doesn't contain ANY information on what the mismatch was, and many of the errors seem to also be 3-way mismatches in the digest :-(. I'm wondering if other people have seen something similar rises in scrub errors in the upgrade, and/or the lack of digest output. I did hear one anecdotal report that 10.2.9 seemed much more likely to fail out marginal disks. The only two changesets I can spot in Jewel that I think might be related are these: 1. http://tracker.ceph.com/issues/20089 https://github.com/ceph/ceph/pull/15416 2. http://tracker.ceph.com/issues/19404 https://github.com/ceph/ceph/pull/14204 Two example PGs that are inconsistent (chosen because they didn't convey any private information so I didn't have to redact anything except IP): $ sudo ceph health detail |grep -e 5.3d40 -e 5.f1c0 pg 5.3d40 is active+clean+inconsistent, acting [1322,990,655] pg 5.f1c0 is active+clean+inconsistent, acting [631,1327,91] $ fgrep 5.3d40 /var/log/ceph/ceph.log 2017-09-07 19:50:16.231523 osd.1322 [REDACTED::8861]:6808/3479303 1736 : cluster [INF] osd.1322 pg 5.3d40 Deep scrub errors, upgrading scrub to deep-scrub 2017-09-07 19:50:16.231862 osd.1322 [REDACTED::8861]:6808/3479303 1737 : cluster [INF] 5.3d40 deep-scrub starts 2017-09-07 19:54:38.631232 osd.1322 [REDACTED::8861]:6808/3479303 1738 : cluster [ERR] 5.3d40 shard 655: soid 5:02bc4def:::.dir.default.64449186.344176:head omap_digest 0x3242b04e != omap_digest 0x337cf025 from auth oi 5:02bc4def:::.dir.default.64449186.344176:head(1177700'1180639 osd.1322.0:537914 dirty|omap|data_digest|omap_digest s 0 uv 1177199 dd ffffffff od 337cf025 alloc_hint [0 0]) 2017-09-07 19:54:38.631332 osd.1322 [REDACTED::8861]:6808/3479303 1739 : cluster [ERR] 5.3d40 shard 1322: soid 5:02bc4def:::.dir.default.64449186.344176:head omap_digest 0xc90d06a8 != omap_digest 0x3242b04e from shard 655, omap_digest 0xc90d06a8 != omap_digest 0x337cf025 from auth oi 5:02bc4def:::.dir.default.64449186.344176:head(1177700'1180639 osd.1322.0:537914 dirty|omap|data_digest|omap_digest s 0 uv 1177199 dd ffffffff od 337cf025 alloc_hint [0 0]) 2017-09-07 20:03:54.721681 osd.1322 [REDACTED::8861]:6808/3479303 1740 : cluster [ERR] 5.3d40 deep-scrub 0 missing, 1 inconsistent objects 2017-09-07 20:03:54.721687 osd.1322 [REDACTED::8861]:6808/3479303 1741 : cluster [ERR] 5.3d40 deep-scrub 3 errors $ fgrep 5.f1c0 /var/log/ceph/ceph.log 2017-09-07 11:11:36.773986 osd.631 [REDACTED::8877]:6813/4036028 4234 : cluster [INF] osd.631 pg 5.f1c0 Deep scrub errors, upgrading scrub to deep-scrub 2017-09-07 11:11:36.774127 osd.631 [REDACTED::8877]:6813/4036028 4235 : cluster [INF] 5.f1c0 deep-scrub starts 2017-09-07 11:25:26.231502 osd.631 [REDACTED::8877]:6813/4036028 4236 : cluster [ERR] 5.f1c0 deep-scrub 0 missing, 1 inconsistent objects 2017-09-07 11:25:26.231508 osd.631 [REDACTED::8877]:6813/4036028 4237 : cluster [ERR] 5.f1c0 deep-scrub 1 errors -- Robin Hugh Johnson Gentoo Linux: Dev, Infra Lead, Foundation Asst. Treasurer E-Mail : robbat2@xxxxxxxxxx GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136
Attachment:
signature.asc
Description: Digital signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com