Re: 1 pg inconsistent and does not recover

Niklas Hambüchen <mail@xxxxxx> · Wed, 28 Jun 2023 02:30:27 +0200

Hi Alvaro,

Can you post the entire Ceph status output?

Pasting here since it is short

    cluster:
      id:     d9000ec0-93c2-479f-bd5d-94ae9673e347
      health: HEALTH_ERR
              1 scrub errors
              Possible data damage: 1 pg inconsistent

    services:
      mon: 3 daemons, quorum node-4,node-5,node-6 (age 52m)
      mgr: node-5(active, since 7d), standbys: node-6, node-4
      mds: 1/1 daemons up, 2 standby
      osd: 36 osds: 36 up (since 5d), 36 in (since 6d)

    data:
      volumes: 1/1 healthy
      pools:   3 pools, 832 pgs
      objects: 506.83M objects, 67 TiB
      usage:   207 TiB used, 232 TiB / 439 TiB avail
      pgs:     826 active+clean
              5   active+clean+scrubbing+deep
              1   active+clean+inconsistent

    io:
      client:   18 MiB/s wr, 0 op/s rd, 5 op/s wr

sometimes list-inconsistent-obj throws that error if a scrub job is still running.

This would be surprising to me, because I did the disk replacement of the broken OSD "2" already 7 days ago, and "list-inconsistent-obj" has not worked at any time since then.

grep -Hn 'ERR' /var/log/ceph/ceph-osd.33.log

    /var/log/ceph/ceph-osd.33.log:8005229:2023-06-16T16:29:57.704+0000 7f9a985e5640 -1 log_channel(cluster) log [ERR] : 2.87 shard 2 soid 2:e18c2025:::1001c78d046.00000000:head : candidate had a read error
    /var/log/ceph/ceph-osd.33.log:8018716:2023-06-16T20:03:26.923+0000 7f9a985e5640 -1 log_channel(cluster) log [ERR] : 2.87 deep-scrub 0 missing, 1 inconsistent objects
    /var/log/ceph/ceph-osd.33.log:8018717:2023-06-16T20:03:26.923+0000 7f9a985e5640 -1 log_channel(cluster) log [ERR] : 2.87 deep-scrub 1 errors

The time "2023-06-16T16:29:57" above is the time at which the disk that carried OSD "2" broke, its logs around the time are:

    /var/log/ceph/ceph-osd.2.log:7855741:2023-06-16T16:29:57.690+0000 7fbae3cf7640 -1 bdev(0x7fbaeef6c400 /var/lib/ceph/osd/ceph-2/block) _aio_thread got r=-5 ((5) Input/output error)
    /var/log/ceph/ceph-osd.2.log:7855743:2023-06-16T16:29:57.690+0000 7fba62863640 -1 log_channel(cluster) log [ERR] : 2.b1 missing primary copy of 2:8df449f9:::10016e7a962.00000000:head, will try copies on 19,32
    /var/log/ceph/ceph-osd.2.log:7855747:2023-06-16T16:29:57.691+0000 7fba63064640 -1 log_channel(cluster) log [ERR] : 2.a6 missing primary copy of 2:65bd8cda:::10016ea4e67.00000000:head, will try copies on 17,28
    -- note time jump by 3 days --
    /var/log/ceph/ceph-osd.2.log:8096330:2023-06-19T06:42:48.712+0000 7fba62863640 -1 log_channel(cluster) log [ERR] : 2.b1 missing primary copy of 2:8d51be04:::1001d7b8447.00000334:head, will try copies on 19,32
    /var/log/ceph/ceph-osd.2.log:8108684: -1867> 2023-06-19T06:42:48.712+0000 7fba62863640 -1 log_channel(cluster) log [ERR] : 2.b1 missing primary copy of 2:8d51be04:::1001d7b8447.00000334:head, will try copies on 19,32
    /var/log/ceph/ceph-osd.2.log:8108766: -1785> 2023-06-19T06:42:49.035+0000 7fba6d879640 10 log_client  will send 2023-06-19T06:42:48.713712+0000 osd.2 (osd.2) 179 : cluster [ERR] 2.b1 missing primary copy of 2:8d51be04:::1001d7b8447.00000334:head, will try copies on 19,32
    /var/log/ceph/ceph-osd.2.log:8108770: -1781> 2023-06-19T06:42:49.525+0000 7fba7787f640 10 log_client  logged 2023-06-19T06:42:48.713712+0000 osd.2 (osd.2) 179 : cluster [ERR] 2.b1 missing primary copy of 2:8d51be04:::1001d7b8447.00000334:head, will try copies on 19,32
    /var/log/ceph/ceph-osd.2.log:8111339:2023-06-19T06:51:13.940+0000 7fb1518126c0 -1  ** ERROR: osd init failed: (5) Input/output error

Does "candidate had a read error" on OSD "33" mean that a BlueStore checksum error was detected on OSD "33" at the same time as the OSD "2" disk failed?
If yes, maybe that is the explanation:

* pg 2.87 is backed by OSDs [33,2,20]; OSD 2's hardware broke during the scrub, OSD 33 detected a checksum error during the scrub, and thus we have 2 OSDs left (33 and 20) whose checksums disagree.

I am just guessing this, though.
Also, if this is correct, the next question would be: What is with OSD 20?
Since there is no error reported at all for OSD 20, I assume that its checksum agrees with its data.
Now, can I find out whether OSD 20's checksum agrees with OSD 33's data?

(Side note: The disk of OSD 33 looks fine in smartctl.)

Thanks,
Niklas
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx