Re: Inconsistent PGs after upgrade to Pacific

Pascal Ehlert <pascal@xxxxxxxxxxxx> · Thu, 23 Jun 2022 22:12:36 +0200

Hi,

I have now tried to "ceph osd pool rmsnap $POOL beforefixes" and it says 
the snapshot could not be found although I have definitely run "ceph osd 
pool mksnap $POOL beforefixes" about three weeks ago.
When running rados list-inconsistent-obj $PG on one of the affected PGs, 
all of the objects returned have "snap" set to 1:

root@srv01:~# for i in $(rados list-inconsistent-pg $POOL | jq -er .[]); 
do rados list-inconsistent-obj $i | jq -er .inconsistents[].object; done
[..]
{
  "name": "200020744f4.00000000",
  "nspace": "",
  "locator": "",
  "snap": 1,
  "version": 5704208
}
{
  "name": "200021aeb16.00000000",
  "nspace": "",
  "locator": "",
  "snap": 1,
  "version": 6189078
}
[..]

Running listsnaps on any of them then looks like this:

root@srv01:~# rados listsnaps 200020744f4.00000000 -p $POOL
200020744f4.00000000:
cloneid    snaps    size    overlap
1    1    0    []
head    -    0

Is it save to assume that these objects belong to a somewhat broken 
snapshot and can be removed safely without causing further damage?

Thanks,

Pascal

Ansgar Jazdzewski wrote on 23.06.22 20:36:
Hi,

we could identify the rbd images that wehre affected and did an export 
before, but in the case of cephfs metadata i have no plan that will work.

can you try to delete the snapshot?
also if the filesystem can be shutdown? try to do a backup of the 
metadatapool

hope you will have some luck, let me know if I can help,
Ansgar

Pascal Ehlert <pascal@xxxxxxxxxxxx <mailto:pascal@xxxxxxxxxxxx>> 
schrieb am Do., 23. Juni 2022, 16:45:

    Hi Ansgar,

    Thank you very much for the response.
    Running your first command to obtain inconsistent objects, I
    retrieve a
    total of 23114 only some of which are snaps.

    You mentioning snapshots did remind me of the fact however that I
    created a snapshot on the Ceph metadata pool via "ceph osd pool $POOL
    mksnap" before I reduced the number of ranks.
    Maybe that has causes the inconsistencies and would explain why the
    actual file system appears unaffected?

    Is there any way to validate that theory? I am a bit hesitant to just
    run "rmsnap". Could that cause inconsistent data to be written
    back to
    the actual objects?

    Best regards,

    Pascal

    Ansgar Jazdzewski wrote on 23.06.22 16:11:
    > Hi Pascal,
    >
    > We just had a similar situation on our RBD and had found some
    bad data
    > in RADOS here is How we did it:
    >
    > for i in $(rados list-inconsistent-pg $POOL | jq -er .[]); do rados
    > list-inconsistent-obj $i | jq -er .inconsistents[].object.name
    <http://object.name>| awk
    > -F'.' '{print $2}'; done
    >
    > we than found inconsistent snaps on the Object:
    >
    > rados list-inconsistent-snapset $PG --format=json-pretty | jq
    > .inconsistents[].name
    >
    > List the data on the OSD's (ceph pg map $PG)
    >
    > ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-${OSD}/
    --op
    > list ${OBJ} --pgid ${PG}
    >
    > and finally remove the object, like:
    > ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-459/ --op
    > list rbd_data.762a94d768c04d.000000000036b7ac --pgid
    > 2.704ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-459/
    >
    '["2.704",{"oid":"rbd_data.801e1d1d9c719d.0000000000044943","key":"","snapid":125458,"hash":4136961796,"max":0,"pool":2,"namespace":"","max":0}]'
    > remove
    >
    > we had to do it for all OSD one after the other after this a 'pg
    repair' worked
    >
    > i hope it will help
    > Ansgar
    >
    > Am Do., 23. Juni 2022 um 15:02 Uhr schrieb Dan van der Ster
    > <dvanders@xxxxxxxxx <mailto:dvanders@xxxxxxxxx>>:
    >> Hi Pascal,
    >>
    >> It's not clear to me how the upgrade procedure you described would
    >> lead to inconsistent PGs.
    >>
    >> Even if you didn't record every step, do you have the ceph.log, the
    >> mds logs, perhaps some osd logs from this time?
    >> And which versions did you upgrade from / to ?
    >>
    >> Cheers, Dan
    >>
    >> On Wed, Jun 22, 2022 at 7:41 PM Pascal Ehlert
    <pascal@xxxxxxxxxxxx <mailto:pascal@xxxxxxxxxxxx>> wrote:
    >>> Hi all,
    >>>
    >>> I am currently battling inconsistent PGs after a far-reaching
    mistake
    >>> during the upgrade from Octopus to Pacific.
    >>> While otherwise following the guide, I restarted the Ceph MDS
    daemons
    >>> (and this started the Pacific daemons) without previously
    reducing the
    >>> ranks to 1 (from 2).
    >>>
    >>> This resulted in daemons not coming up and reporting
    inconsistencies.
    >>> After later reducing the ranks and bringing the MDS back up (I
    did not
    >>> record every step as this was an emergency situation), we
    started seeing
    >>> health errors on every scrub.
    >>>
    >>> Now after three weeks, while our CephFS is still working fine
    and we
    >>> haven't noticed any data damage, we realized that every single
    PG of the
    >>> cephfs metadata pool is affected.
    >>> Below you can find some information on the actual status and a
    detailed
    >>> inspection of one of the affected pgs. I am happy to provide
    any other
    >>> information that could be useful of course.
    >>>
    >>> A repair of the affected PGs does not resolve the issue.
    >>> Does anyone else here have an idea what we could try apart
    from copying
    >>> all the data to a new CephFS pool?
    >>>
    >>>
    >>>
    >>> Thank you!
    >>>
    >>> Pascal
    >>>
    >>>
    >>>
    >>>
    >>> root@srv02:~# ceph status
    >>>     cluster:
    >>>       id:     f0d6d4d0-8c17-471a-9f95-ebc80f1fee78
    >>>       health: HEALTH_ERR
    >>>               insufficient standby MDS daemons available
    >>>               69262 scrub errors
    >>>               Too many repaired reads on 2 OSDs
    >>>               Possible data damage: 64 pgs inconsistent
    >>>
    >>>     services:
    >>>       mon: 3 daemons, quorum srv02,srv03,srv01 (age 3w)
    >>>       mgr: srv03(active, since 3w), standbys: srv01, srv02
    >>>       mds: 2/2 daemons up, 1 hot standby
    >>>       osd: 44 osds: 44 up (since 3w), 44 in (since 10M)
    >>>
    >>>     data:
    >>>       volumes: 2/2 healthy
    >>>       pools:   13 pools, 1217 pgs
    >>>       objects: 75.72M objects, 26 TiB
    >>>       usage:   80 TiB used, 42 TiB / 122 TiB avail
    >>>       pgs:     1153 active+clean
    >>>                55   active+clean+inconsistent
    >>>                9    active+clean+inconsistent+failed_repair
    >>>
    >>>     io:
    >>>       client:   2.0 MiB/s rd, 21 MiB/s wr, 240 op/s rd, 1.75k
    op/s wr
    >>>
    >>>
    >>> {
    >>>     "epoch": 4962617,
    >>>     "inconsistents": [
    >>>       {
    >>>         "object": {
    >>>           "name": "1000000cc8e.00000000",
    >>>           "nspace": "",
    >>>           "locator": "",
    >>>           "snap": 1,
    >>>           "version": 4253817
    >>>         },
    >>>         "errors": [],
    >>>         "union_shard_errors": [
    >>>           "omap_digest_mismatch_info"
    >>>         ],
    >>>         "selected_object_info": {
    >>>           "oid": {
    >>>             "oid": "1000000cc8e.00000000",
    >>>             "key": "",
    >>>             "snapid": 1,
    >>>             "hash": 1369745244,
    >>>             "max": 0,
    >>>             "pool": 7,
    >>>             "namespace": ""
    >>>           },
    >>>           "version": "4962847'6209730",
    >>>           "prior_version": "3916665'4306116",
    >>>           "last_reqid": "osd.27.0:757107407",
    >>>           "user_version": 4253817,
    >>>           "size": 0,
    >>>           "mtime": "2022-02-26T12:56:55.612420+0100",
    >>>           "local_mtime": "2022-02-26T12:56:55.614429+0100",
    >>>           "lost": 0,
    >>>           "flags": [
    >>>             "dirty",
    >>>             "omap",
    >>>             "data_digest",
    >>>             "omap_digest"
    >>>           ],
    >>>           "truncate_seq": 0,
    >>>           "truncate_size": 0,
    >>>           "data_digest": "0xffffffff",
    >>>           "omap_digest": "0xe5211a9e",
    >>>           "expected_object_size": 0,
    >>>           "expected_write_size": 0,
    >>>           "alloc_hint_flags": 0,
    >>>           "manifest": {
    >>>             "type": 0
    >>>           },
    >>>           "watchers": {}
    >>>         },
    >>>         "shards": [
    >>>           {
    >>>             "osd": 20,
    >>>             "primary": false,
    >>>             "errors": [
    >>>               "omap_digest_mismatch_info"
    >>>             ],
    >>>             "size": 0,
    >>>             "omap_digest": "0xffffffff",
    >>>             "data_digest": "0xffffffff"
    >>>           },
    >>>           {
    >>>             "osd": 27,
    >>>             "primary": true,
    >>>             "errors": [
    >>>               "omap_digest_mismatch_info"
    >>>             ],
    >>>             "size": 0,
    >>>             "omap_digest": "0xffffffff",
    >>>             "data_digest": "0xffffffff"
    >>>           },
    >>>           {
    >>>             "osd": 43,
    >>>             "primary": false,
    >>>             "errors": [
    >>>               "omap_digest_mismatch_info"
    >>>             ],
    >>>             "size": 0,
    >>>             "omap_digest": "0xffffffff",
    >>>             "data_digest": "0xffffffff"
    >>>           }
    >>>         ]
    >>>       },
    >>>
    >>>
    >>>
    >>>
    >>> _______________________________________________
    >>> ceph-users mailing list -- ceph-users@xxxxxxx
    <mailto:ceph-users@xxxxxxx>
    >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
    <mailto:ceph-users-leave@xxxxxxx>
    >> _______________________________________________
    >> ceph-users mailing list -- ceph-users@xxxxxxx
    <mailto:ceph-users@xxxxxxx>
    >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
    <mailto:ceph-users-leave@xxxxxxx>

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx