Re: Inconsistent PGs after upgrade to Pacific

Dan van der Ster <dvanders@xxxxxxxxx> · Fri, 24 Jun 2022 09:27:21 +0200

Hi,

It's trivial to reproduce. Running 16.2.9 with max_mds=2, take a pool
snapshot of the meta pool, then decrease to max_mds=1, then deep scrub
each meta pg.

In my test I could list and remove the pool snap, then deep-scrub
again cleared the inconsistencies.

https://tracker.ceph.com/issues/56386

Cheers, Dan

On Fri, Jun 24, 2022 at 8:41 AM Ansgar Jazdzewski
<a.jazdzewski@xxxxxxxxxxxxxx> wrote:
>
> Hi,
>
> I would say yes but it would be nice if other people can confirm it too.
>
> also can you create a test cluster and do the same tasks
> * create it with octopus
> * create snapshot
> * reduce rank to 1
> * upgrade to pacific
>
> and then try to fix the PG, assuming that you will have the same
> issues in your test-cluster,
>
> cheers,
> Ansgar
>
> Am Do., 23. Juni 2022 um 22:12 Uhr schrieb Pascal Ehlert <pascal@xxxxxxxxxxxx>:
> >
> > Hi,
> >
> > I have now tried to "ceph osd pool rmsnap $POOL beforefixes" and it says the snapshot could not be found although I have definitely run "ceph osd pool mksnap $POOL beforefixes" about three weeks ago.
> > When running rados list-inconsistent-obj $PG on one of the affected PGs, all of the objects returned have "snap" set to 1:
> >
> > root@srv01:~# for i in $(rados list-inconsistent-pg $POOL | jq -er .[]); do rados list-inconsistent-obj $i | jq -er .inconsistents[].object; done
> > [..]
> > {
> >   "name": "200020744f4.00000000",
> >   "nspace": "",
> >   "locator": "",
> >   "snap": 1,
> >   "version": 5704208
> > }
> > {
> >   "name": "200021aeb16.00000000",
> >   "nspace": "",
> >   "locator": "",
> >   "snap": 1,
> >   "version": 6189078
> > }
> > [..]
> >
> > Running listsnaps on any of them then looks like this:
> >
> > root@srv01:~# rados listsnaps 200020744f4.00000000 -p $POOL
> > 200020744f4.00000000:
> > cloneid    snaps    size    overlap
> > 1    1    0    []
> > head    -    0
> >
> >
> > Is it save to assume that these objects belong to a somewhat broken snapshot and can be removed safely without causing further damage?
> >
> >
> > Thanks,
> >
> > Pascal
> >
> >
> >
> > Ansgar Jazdzewski wrote on 23.06.22 20:36:
> >
> > Hi,
> >
> > we could identify the rbd images that wehre affected and did an export before, but in the case of cephfs metadata i have no plan that will work.
> >
> > can you try to delete the snapshot?
> > also if the filesystem can be shutdown? try to do a backup of the metadatapool
> >
> > hope you will have some luck, let me know if I can help,
> > Ansgar
> >
> > Pascal Ehlert <pascal@xxxxxxxxxxxx> schrieb am Do., 23. Juni 2022, 16:45:
> >>
> >> Hi Ansgar,
> >>
> >> Thank you very much for the response.
> >> Running your first command to obtain inconsistent objects, I retrieve a
> >> total of 23114 only some of which are snaps.
> >>
> >> You mentioning snapshots did remind me of the fact however that I
> >> created a snapshot on the Ceph metadata pool via "ceph osd pool $POOL
> >> mksnap" before I reduced the number of ranks.
> >> Maybe that has causes the inconsistencies and would explain why the
> >> actual file system appears unaffected?
> >>
> >> Is there any way to validate that theory? I am a bit hesitant to just
> >> run "rmsnap". Could that cause inconsistent data to be written back to
> >> the actual objects?
> >>
> >>
> >> Best regards,
> >>
> >> Pascal
> >>
> >>
> >>
> >> Ansgar Jazdzewski wrote on 23.06.22 16:11:
> >> > Hi Pascal,
> >> >
> >> > We just had a similar situation on our RBD and had found some bad data
> >> > in RADOS here is How we did it:
> >> >
> >> > for i in $(rados list-inconsistent-pg $POOL | jq -er .[]); do rados
> >> > list-inconsistent-obj $i | jq -er .inconsistents[].object.name| awk
> >> > -F'.' '{print $2}'; done
> >> >
> >> > we than found inconsistent snaps on the Object:
> >> >
> >> > rados list-inconsistent-snapset $PG --format=json-pretty | jq
> >> > .inconsistents[].name
> >> >
> >> > List the data on the OSD's (ceph pg map $PG)
> >> >
> >> > ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-${OSD}/ --op
> >> > list ${OBJ} --pgid ${PG}
> >> >
> >> > and finally remove the object, like:
> >> > ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-459/ --op
> >> > list rbd_data.762a94d768c04d.000000000036b7ac --pgid
> >> > 2.704ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-459/
> >> > '["2.704",{"oid":"rbd_data.801e1d1d9c719d.0000000000044943","key":"","snapid":125458,"hash":4136961796,"max":0,"pool":2,"namespace":"","max":0}]'
> >> > remove
> >> >
> >> > we had to do it for all OSD one after the other after this a 'pg repair' worked
> >> >
> >> > i hope it will help
> >> > Ansgar
> >> >
> >> > Am Do., 23. Juni 2022 um 15:02 Uhr schrieb Dan van der Ster
> >> > <dvanders@xxxxxxxxx>:
> >> >> Hi Pascal,
> >> >>
> >> >> It's not clear to me how the upgrade procedure you described would
> >> >> lead to inconsistent PGs.
> >> >>
> >> >> Even if you didn't record every step, do you have the ceph.log, the
> >> >> mds logs, perhaps some osd logs from this time?
> >> >> And which versions did you upgrade from / to ?
> >> >>
> >> >> Cheers, Dan
> >> >>
> >> >> On Wed, Jun 22, 2022 at 7:41 PM Pascal Ehlert <pascal@xxxxxxxxxxxx> wrote:
> >> >>> Hi all,
> >> >>>
> >> >>> I am currently battling inconsistent PGs after a far-reaching mistake
> >> >>> during the upgrade from Octopus to Pacific.
> >> >>> While otherwise following the guide, I restarted the Ceph MDS daemons
> >> >>> (and this started the Pacific daemons) without previously reducing the
> >> >>> ranks to 1 (from 2).
> >> >>>
> >> >>> This resulted in daemons not coming up and reporting inconsistencies.
> >> >>> After later reducing the ranks and bringing the MDS back up (I did not
> >> >>> record every step as this was an emergency situation), we started seeing
> >> >>> health errors on every scrub.
> >> >>>
> >> >>> Now after three weeks, while our CephFS is still working fine and we
> >> >>> haven't noticed any data damage, we realized that every single PG of the
> >> >>> cephfs metadata pool is affected.
> >> >>> Below you can find some information on the actual status and a detailed
> >> >>> inspection of one of the affected pgs. I am happy to provide any other
> >> >>> information that could be useful of course.
> >> >>>
> >> >>> A repair of the affected PGs does not resolve the issue.
> >> >>> Does anyone else here have an idea what we could try apart from copying
> >> >>> all the data to a new CephFS pool?
> >> >>>
> >> >>>
> >> >>>
> >> >>> Thank you!
> >> >>>
> >> >>> Pascal
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> root@srv02:~# ceph status
> >> >>>     cluster:
> >> >>>       id:     f0d6d4d0-8c17-471a-9f95-ebc80f1fee78
> >> >>>       health: HEALTH_ERR
> >> >>>               insufficient standby MDS daemons available
> >> >>>               69262 scrub errors
> >> >>>               Too many repaired reads on 2 OSDs
> >> >>>               Possible data damage: 64 pgs inconsistent
> >> >>>
> >> >>>     services:
> >> >>>       mon: 3 daemons, quorum srv02,srv03,srv01 (age 3w)
> >> >>>       mgr: srv03(active, since 3w), standbys: srv01, srv02
> >> >>>       mds: 2/2 daemons up, 1 hot standby
> >> >>>       osd: 44 osds: 44 up (since 3w), 44 in (since 10M)
> >> >>>
> >> >>>     data:
> >> >>>       volumes: 2/2 healthy
> >> >>>       pools:   13 pools, 1217 pgs
> >> >>>       objects: 75.72M objects, 26 TiB
> >> >>>       usage:   80 TiB used, 42 TiB / 122 TiB avail
> >> >>>       pgs:     1153 active+clean
> >> >>>                55   active+clean+inconsistent
> >> >>>                9    active+clean+inconsistent+failed_repair
> >> >>>
> >> >>>     io:
> >> >>>       client:   2.0 MiB/s rd, 21 MiB/s wr, 240 op/s rd, 1.75k op/s wr
> >> >>>
> >> >>>
> >> >>> {
> >> >>>     "epoch": 4962617,
> >> >>>     "inconsistents": [
> >> >>>       {
> >> >>>         "object": {
> >> >>>           "name": "1000000cc8e.00000000",
> >> >>>           "nspace": "",
> >> >>>           "locator": "",
> >> >>>           "snap": 1,
> >> >>>           "version": 4253817
> >> >>>         },
> >> >>>         "errors": [],
> >> >>>         "union_shard_errors": [
> >> >>>           "omap_digest_mismatch_info"
> >> >>>         ],
> >> >>>         "selected_object_info": {
> >> >>>           "oid": {
> >> >>>             "oid": "1000000cc8e.00000000",
> >> >>>             "key": "",
> >> >>>             "snapid": 1,
> >> >>>             "hash": 1369745244,
> >> >>>             "max": 0,
> >> >>>             "pool": 7,
> >> >>>             "namespace": ""
> >> >>>           },
> >> >>>           "version": "4962847'6209730",
> >> >>>           "prior_version": "3916665'4306116",
> >> >>>           "last_reqid": "osd.27.0:757107407",
> >> >>>           "user_version": 4253817,
> >> >>>           "size": 0,
> >> >>>           "mtime": "2022-02-26T12:56:55.612420+0100",
> >> >>>           "local_mtime": "2022-02-26T12:56:55.614429+0100",
> >> >>>           "lost": 0,
> >> >>>           "flags": [
> >> >>>             "dirty",
> >> >>>             "omap",
> >> >>>             "data_digest",
> >> >>>             "omap_digest"
> >> >>>           ],
> >> >>>           "truncate_seq": 0,
> >> >>>           "truncate_size": 0,
> >> >>>           "data_digest": "0xffffffff",
> >> >>>           "omap_digest": "0xe5211a9e",
> >> >>>           "expected_object_size": 0,
> >> >>>           "expected_write_size": 0,
> >> >>>           "alloc_hint_flags": 0,
> >> >>>           "manifest": {
> >> >>>             "type": 0
> >> >>>           },
> >> >>>           "watchers": {}
> >> >>>         },
> >> >>>         "shards": [
> >> >>>           {
> >> >>>             "osd": 20,
> >> >>>             "primary": false,
> >> >>>             "errors": [
> >> >>>               "omap_digest_mismatch_info"
> >> >>>             ],
> >> >>>             "size": 0,
> >> >>>             "omap_digest": "0xffffffff",
> >> >>>             "data_digest": "0xffffffff"
> >> >>>           },
> >> >>>           {
> >> >>>             "osd": 27,
> >> >>>             "primary": true,
> >> >>>             "errors": [
> >> >>>               "omap_digest_mismatch_info"
> >> >>>             ],
> >> >>>             "size": 0,
> >> >>>             "omap_digest": "0xffffffff",
> >> >>>             "data_digest": "0xffffffff"
> >> >>>           },
> >> >>>           {
> >> >>>             "osd": 43,
> >> >>>             "primary": false,
> >> >>>             "errors": [
> >> >>>               "omap_digest_mismatch_info"
> >> >>>             ],
> >> >>>             "size": 0,
> >> >>>             "omap_digest": "0xffffffff",
> >> >>>             "data_digest": "0xffffffff"
> >> >>>           }
> >> >>>         ]
> >> >>>       },
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> _______________________________________________
> >> >>> ceph-users mailing list -- ceph-users@xxxxxxx
> >> >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >> >> _______________________________________________
> >> >> ceph-users mailing list -- ceph-users@xxxxxxx
> >> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx