In my experience inconsistencies caused by IO errors always have a SCSI Medium Error showing up in the kernel logs. (dmesg, journalctl -k, /v/l/messages, ...) (Except in the case of one very bad non-enterprise SMR drive I run at home, not at work). -- dan On Fri, Dec 4, 2020 at 11:03 AM Hans van den Bogert <hansbogert@xxxxxxxxx> wrote: > > Interesting, your comment implies that it is a replication issue, which > does not stem from a faulty disk. But, couldn't the disk have a bit > flip? Or would you argue that would've shown as a disk read error > somewhere (because of ECC on the disk.) > > On 12/4/20 10:51 AM, Dan van der Ster wrote: > > Note that in this case the inconsistencies are not coming from object > > reads, but from comparing the omap digests of an rgw index shard. > > This seems to be a result of a replication issue sometime in the past > > on this cluster. > > > > On Fri, Dec 4, 2020 at 10:32 AM Eugen Block <eblock@xxxxxx> wrote: > >> > >> Hi, > >> > >> this is not necessarily but most likely a hint to a (slowly) failing > >> disk. Check all OSDs for this PG for disk errors in dmesg and smartctl. > >> > >> Regards, > >> Eugen > >> > >> > >> Zitat von "Szabo, Istvan (Agoda)" <Istvan.Szabo@xxxxxxxxx>: > >> > >>> Hi, > >>> > >>> Not sure is it related to my 15.2.7 update, but today I got many > >>> time this issue: > >>> > >>> 2020-12-04T15:14:23.910799+0700 osd.40 (osd.40) 11 : cluster [DBG] > >>> 11.2 deep-scrub starts > >>> 2020-12-04T15:14:23.947255+0700 osd.40 (osd.40) 12 : cluster [ERR] > >>> 11.2 soid > >>> 11:434f049b:::.dir.75333f99-93d0-4238-91a4-ba833a0edd24.1744118.372.1:head : > >>> omap_digest 0x48532c00 != omap_digest 0x8a18f5d7 from shard 40 > >>> 2020-12-04T15:14:23.977138+0700 mgr.hk-cephmon-2s02 (mgr.2120884) > >>> 4330 : cluster [DBG] pgmap v4338: 209 pgs: 209 active+clean; 2.8 GiB > >>> data, 21 TiB used, 513 TiB / 534 TiB avail; 32 KiB/s rd, 32 op/s > >>> 2020-12-04T15:14:24.030888+0700 osd.40 (osd.40) 13 : cluster [ERR] > >>> 11.2 soid > >>> 11:4b86603b:::.dir.75333f99-93d0-4238-91a4-ba833a0edd24.1744118.197.3:head : > >>> omap_digest 0xcb62779b != omap_digest 0xefef7471 from shard 40 > >>> 2020-12-04T15:14:24.229000+0700 osd.40 (osd.40) 14 : cluster [ERR] > >>> 11.2 deep-scrub 0 missing, 2 inconsistent objects > >>> 2020-12-04T15:14:24.229003+0700 osd.40 (osd.40) 15 : cluster [ERR] > >>> 11.2 deep-scrub 2 errors > >>> 2020-12-04T15:14:25.978189+0700 mgr.hk-cephmon-2s02 (mgr.2120884) > >>> 4331 : cluster [DBG] pgmap v4339: 209 pgs: 1 > >>> active+clean+scrubbing+deep, 208 active+clean; 2.8 GiB data, 21 TiB > >>> used, 513 TiB / 534 TiB avail; 55 KiB/s rd, 0 B/s wr, 61 op/s > >>> 2020-12-04T15:14:27.978588+0700 mgr.hk-cephmon-2s02 (mgr.2120884) > >>> 4332 : cluster [DBG] pgmap v4340: 209 pgs: 1 > >>> active+clean+scrubbing+deep, 208 active+clean; 2.8 GiB data, 21 TiB > >>> used, 513 TiB / 534 TiB avail; 43 KiB/s rd, 0 B/s wr, 49 op/s > >>> 2020-12-04T15:14:30.293180+0700 mon.hk-cephmon-2s01 (mon.0) 4475 : > >>> cluster [ERR] Health check failed: 2 scrub errors (OSD_SCRUB_ERRORS) > >>> 2020-12-04T15:14:30.293196+0700 mon.hk-cephmon-2s01 (mon.0) 4476 : > >>> cluster [ERR] Health check failed: Possible data damage: 1 pg > >>> inconsistent (PG_DAMAGED) > >>> > >>> I had to repair pg and it worked fine, but not sure where this come > >>> from. I have this in the log only :/ > >>> > >>> Thank you. > >>> > >>> ________________________________ > >>> This message is confidential and is for the sole use of the intended > >>> recipient(s). It may also be privileged or otherwise protected by > >>> copyright or other legal rules. If you have received it by mistake > >>> please let us know by reply email and delete it from your system. It > >>> is prohibited to copy this message or disclose its content to > >>> anyone. Any confidentiality or privilege is not waived or lost by > >>> any mistaken delivery or unauthorized disclosure of the message. All > >>> messages sent to and from Agoda may be monitored to ensure > >>> compliance with company policies, to protect the company's interests > >>> and to remove potential malware. Electronic messages may be > >>> intercepted, amended, lost or deleted, or contain viruses. > >>> _______________________________________________ > >>> ceph-users mailing list -- ceph-users@xxxxxxx > >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx > >> > >> > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users@xxxxxxx > >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx