Inconsistent PGs after upgrade to Pacific

Pascal Ehlert <pascal@xxxxxxxxxxxx> · Wed, 22 Jun 2022 19:41:12 +0200

Hi all,

I am currently battling inconsistent PGs after a far-reaching mistake 
during the upgrade from Octopus to Pacific.
While otherwise following the guide, I restarted the Ceph MDS daemons 
(and this started the Pacific daemons) without previously reducing the 
ranks to 1 (from 2).

This resulted in daemons not coming up and reporting inconsistencies.
After later reducing the ranks and bringing the MDS back up (I did not 
record every step as this was an emergency situation), we started seeing 
health errors on every scrub.

Now after three weeks, while our CephFS is still working fine and we 
haven't noticed any data damage, we realized that every single PG of the 
cephfs metadata pool is affected.
Below you can find some information on the actual status and a detailed 
inspection of one of the affected pgs. I am happy to provide any other 
information that could be useful of course.

A repair of the affected PGs does not resolve the issue.
Does anyone else here have an idea what we could try apart from copying 
all the data to a new CephFS pool?

Thank you!

Pascal

root@srv02:~# ceph status
  cluster:
    id:     f0d6d4d0-8c17-471a-9f95-ebc80f1fee78
    health: HEALTH_ERR
            insufficient standby MDS daemons available
            69262 scrub errors
            Too many repaired reads on 2 OSDs
            Possible data damage: 64 pgs inconsistent

  services:
    mon: 3 daemons, quorum srv02,srv03,srv01 (age 3w)
    mgr: srv03(active, since 3w), standbys: srv01, srv02
    mds: 2/2 daemons up, 1 hot standby
    osd: 44 osds: 44 up (since 3w), 44 in (since 10M)

  data:
    volumes: 2/2 healthy
    pools:   13 pools, 1217 pgs
    objects: 75.72M objects, 26 TiB
    usage:   80 TiB used, 42 TiB / 122 TiB avail
    pgs:     1153 active+clean
             55   active+clean+inconsistent
             9    active+clean+inconsistent+failed_repair

  io:
    client:   2.0 MiB/s rd, 21 MiB/s wr, 240 op/s rd, 1.75k op/s wr

{
  "epoch": 4962617,
  "inconsistents": [
    {
      "object": {
        "name": "1000000cc8e.00000000",
        "nspace": "",
        "locator": "",
        "snap": 1,
        "version": 4253817
      },
      "errors": [],
      "union_shard_errors": [
        "omap_digest_mismatch_info"
      ],
      "selected_object_info": {
        "oid": {
          "oid": "1000000cc8e.00000000",
          "key": "",
          "snapid": 1,
          "hash": 1369745244,
          "max": 0,
          "pool": 7,
          "namespace": ""
        },
        "version": "4962847'6209730",
        "prior_version": "3916665'4306116",
        "last_reqid": "osd.27.0:757107407",
        "user_version": 4253817,
        "size": 0,
        "mtime": "2022-02-26T12:56:55.612420+0100",
        "local_mtime": "2022-02-26T12:56:55.614429+0100",
        "lost": 0,
        "flags": [
          "dirty",
          "omap",
          "data_digest",
          "omap_digest"
        ],
        "truncate_seq": 0,
        "truncate_size": 0,
        "data_digest": "0xffffffff",
        "omap_digest": "0xe5211a9e",
        "expected_object_size": 0,
        "expected_write_size": 0,
        "alloc_hint_flags": 0,
        "manifest": {
          "type": 0
        },
        "watchers": {}
      },
      "shards": [
        {
          "osd": 20,
          "primary": false,
          "errors": [
            "omap_digest_mismatch_info"
          ],
          "size": 0,
          "omap_digest": "0xffffffff",
          "data_digest": "0xffffffff"
        },
        {
          "osd": 27,
          "primary": true,
          "errors": [
            "omap_digest_mismatch_info"
          ],
          "size": 0,
          "omap_digest": "0xffffffff",
          "data_digest": "0xffffffff"
        },
        {
          "osd": 43,
          "primary": false,
          "errors": [
            "omap_digest_mismatch_info"
          ],
          "size": 0,
          "omap_digest": "0xffffffff",
          "data_digest": "0xffffffff"
        }
      ]
    },

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx