Inconsistent PGs after upgrade to Pacific

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

I am currently battling inconsistent PGs after a far-reaching mistake during the upgrade from Octopus to Pacific. While otherwise following the guide, I restarted the Ceph MDS daemons (and this started the Pacific daemons) without previously reducing the ranks to 1 (from 2).

This resulted in daemons not coming up and reporting inconsistencies.
After later reducing the ranks and bringing the MDS back up (I did not record every step as this was an emergency situation), we started seeing health errors on every scrub.

Now after three weeks, while our CephFS is still working fine and we haven't noticed any data damage, we realized that every single PG of the cephfs metadata pool is affected. Below you can find some information on the actual status and a detailed inspection of one of the affected pgs. I am happy to provide any other information that could be useful of course.

A repair of the affected PGs does not resolve the issue.
Does anyone else here have an idea what we could try apart from copying all the data to a new CephFS pool?



Thank you!

Pascal




root@srv02:~# ceph status
  cluster:
    id:     f0d6d4d0-8c17-471a-9f95-ebc80f1fee78
    health: HEALTH_ERR
            insufficient standby MDS daemons available
            69262 scrub errors
            Too many repaired reads on 2 OSDs
            Possible data damage: 64 pgs inconsistent

  services:
    mon: 3 daemons, quorum srv02,srv03,srv01 (age 3w)
    mgr: srv03(active, since 3w), standbys: srv01, srv02
    mds: 2/2 daemons up, 1 hot standby
    osd: 44 osds: 44 up (since 3w), 44 in (since 10M)

  data:
    volumes: 2/2 healthy
    pools:   13 pools, 1217 pgs
    objects: 75.72M objects, 26 TiB
    usage:   80 TiB used, 42 TiB / 122 TiB avail
    pgs:     1153 active+clean
             55   active+clean+inconsistent
             9    active+clean+inconsistent+failed_repair

  io:
    client:   2.0 MiB/s rd, 21 MiB/s wr, 240 op/s rd, 1.75k op/s wr


{
  "epoch": 4962617,
  "inconsistents": [
    {
      "object": {
        "name": "1000000cc8e.00000000",
        "nspace": "",
        "locator": "",
        "snap": 1,
        "version": 4253817
      },
      "errors": [],
      "union_shard_errors": [
        "omap_digest_mismatch_info"
      ],
      "selected_object_info": {
        "oid": {
          "oid": "1000000cc8e.00000000",
          "key": "",
          "snapid": 1,
          "hash": 1369745244,
          "max": 0,
          "pool": 7,
          "namespace": ""
        },
        "version": "4962847'6209730",
        "prior_version": "3916665'4306116",
        "last_reqid": "osd.27.0:757107407",
        "user_version": 4253817,
        "size": 0,
        "mtime": "2022-02-26T12:56:55.612420+0100",
        "local_mtime": "2022-02-26T12:56:55.614429+0100",
        "lost": 0,
        "flags": [
          "dirty",
          "omap",
          "data_digest",
          "omap_digest"
        ],
        "truncate_seq": 0,
        "truncate_size": 0,
        "data_digest": "0xffffffff",
        "omap_digest": "0xe5211a9e",
        "expected_object_size": 0,
        "expected_write_size": 0,
        "alloc_hint_flags": 0,
        "manifest": {
          "type": 0
        },
        "watchers": {}
      },
      "shards": [
        {
          "osd": 20,
          "primary": false,
          "errors": [
            "omap_digest_mismatch_info"
          ],
          "size": 0,
          "omap_digest": "0xffffffff",
          "data_digest": "0xffffffff"
        },
        {
          "osd": 27,
          "primary": true,
          "errors": [
            "omap_digest_mismatch_info"
          ],
          "size": 0,
          "omap_digest": "0xffffffff",
          "data_digest": "0xffffffff"
        },
        {
          "osd": 43,
          "primary": false,
          "errors": [
            "omap_digest_mismatch_info"
          ],
          "size": 0,
          "omap_digest": "0xffffffff",
          "data_digest": "0xffffffff"
        }
      ]
    },




_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux