Hi, It's trivial to reproduce. Running 16.2.9 with max_mds=2, take a pool snapshot of the meta pool, then decrease to max_mds=1, then deep scrub each meta pg. In my test I could list and remove the pool snap, then deep-scrub again cleared the inconsistencies. https://tracker.ceph.com/issues/56386 Cheers, Dan On Fri, Jun 24, 2022 at 8:41 AM Ansgar Jazdzewski <a.jazdzewski@xxxxxxxxxxxxxx> wrote: > > Hi, > > I would say yes but it would be nice if other people can confirm it too. > > also can you create a test cluster and do the same tasks > * create it with octopus > * create snapshot > * reduce rank to 1 > * upgrade to pacific > > and then try to fix the PG, assuming that you will have the same > issues in your test-cluster, > > cheers, > Ansgar > > Am Do., 23. Juni 2022 um 22:12 Uhr schrieb Pascal Ehlert <pascal@xxxxxxxxxxxx>: > > > > Hi, > > > > I have now tried to "ceph osd pool rmsnap $POOL beforefixes" and it says the snapshot could not be found although I have definitely run "ceph osd pool mksnap $POOL beforefixes" about three weeks ago. > > When running rados list-inconsistent-obj $PG on one of the affected PGs, all of the objects returned have "snap" set to 1: > > > > root@srv01:~# for i in $(rados list-inconsistent-pg $POOL | jq -er .[]); do rados list-inconsistent-obj $i | jq -er .inconsistents[].object; done > > [..] > > { > > "name": "200020744f4.00000000", > > "nspace": "", > > "locator": "", > > "snap": 1, > > "version": 5704208 > > } > > { > > "name": "200021aeb16.00000000", > > "nspace": "", > > "locator": "", > > "snap": 1, > > "version": 6189078 > > } > > [..] > > > > Running listsnaps on any of them then looks like this: > > > > root@srv01:~# rados listsnaps 200020744f4.00000000 -p $POOL > > 200020744f4.00000000: > > cloneid snaps size overlap > > 1 1 0 [] > > head - 0 > > > > > > Is it save to assume that these objects belong to a somewhat broken snapshot and can be removed safely without causing further damage? > > > > > > Thanks, > > > > Pascal > > > > > > > > Ansgar Jazdzewski wrote on 23.06.22 20:36: > > > > Hi, > > > > we could identify the rbd images that wehre affected and did an export before, but in the case of cephfs metadata i have no plan that will work. > > > > can you try to delete the snapshot? > > also if the filesystem can be shutdown? try to do a backup of the metadatapool > > > > hope you will have some luck, let me know if I can help, > > Ansgar > > > > Pascal Ehlert <pascal@xxxxxxxxxxxx> schrieb am Do., 23. Juni 2022, 16:45: > >> > >> Hi Ansgar, > >> > >> Thank you very much for the response. > >> Running your first command to obtain inconsistent objects, I retrieve a > >> total of 23114 only some of which are snaps. > >> > >> You mentioning snapshots did remind me of the fact however that I > >> created a snapshot on the Ceph metadata pool via "ceph osd pool $POOL > >> mksnap" before I reduced the number of ranks. > >> Maybe that has causes the inconsistencies and would explain why the > >> actual file system appears unaffected? > >> > >> Is there any way to validate that theory? I am a bit hesitant to just > >> run "rmsnap". Could that cause inconsistent data to be written back to > >> the actual objects? > >> > >> > >> Best regards, > >> > >> Pascal > >> > >> > >> > >> Ansgar Jazdzewski wrote on 23.06.22 16:11: > >> > Hi Pascal, > >> > > >> > We just had a similar situation on our RBD and had found some bad data > >> > in RADOS here is How we did it: > >> > > >> > for i in $(rados list-inconsistent-pg $POOL | jq -er .[]); do rados > >> > list-inconsistent-obj $i | jq -er .inconsistents[].object.name| awk > >> > -F'.' '{print $2}'; done > >> > > >> > we than found inconsistent snaps on the Object: > >> > > >> > rados list-inconsistent-snapset $PG --format=json-pretty | jq > >> > .inconsistents[].name > >> > > >> > List the data on the OSD's (ceph pg map $PG) > >> > > >> > ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-${OSD}/ --op > >> > list ${OBJ} --pgid ${PG} > >> > > >> > and finally remove the object, like: > >> > ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-459/ --op > >> > list rbd_data.762a94d768c04d.000000000036b7ac --pgid > >> > 2.704ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-459/ > >> > '["2.704",{"oid":"rbd_data.801e1d1d9c719d.0000000000044943","key":"","snapid":125458,"hash":4136961796,"max":0,"pool":2,"namespace":"","max":0}]' > >> > remove > >> > > >> > we had to do it for all OSD one after the other after this a 'pg repair' worked > >> > > >> > i hope it will help > >> > Ansgar > >> > > >> > Am Do., 23. Juni 2022 um 15:02 Uhr schrieb Dan van der Ster > >> > <dvanders@xxxxxxxxx>: > >> >> Hi Pascal, > >> >> > >> >> It's not clear to me how the upgrade procedure you described would > >> >> lead to inconsistent PGs. > >> >> > >> >> Even if you didn't record every step, do you have the ceph.log, the > >> >> mds logs, perhaps some osd logs from this time? > >> >> And which versions did you upgrade from / to ? > >> >> > >> >> Cheers, Dan > >> >> > >> >> On Wed, Jun 22, 2022 at 7:41 PM Pascal Ehlert <pascal@xxxxxxxxxxxx> wrote: > >> >>> Hi all, > >> >>> > >> >>> I am currently battling inconsistent PGs after a far-reaching mistake > >> >>> during the upgrade from Octopus to Pacific. > >> >>> While otherwise following the guide, I restarted the Ceph MDS daemons > >> >>> (and this started the Pacific daemons) without previously reducing the > >> >>> ranks to 1 (from 2). > >> >>> > >> >>> This resulted in daemons not coming up and reporting inconsistencies. > >> >>> After later reducing the ranks and bringing the MDS back up (I did not > >> >>> record every step as this was an emergency situation), we started seeing > >> >>> health errors on every scrub. > >> >>> > >> >>> Now after three weeks, while our CephFS is still working fine and we > >> >>> haven't noticed any data damage, we realized that every single PG of the > >> >>> cephfs metadata pool is affected. > >> >>> Below you can find some information on the actual status and a detailed > >> >>> inspection of one of the affected pgs. I am happy to provide any other > >> >>> information that could be useful of course. > >> >>> > >> >>> A repair of the affected PGs does not resolve the issue. > >> >>> Does anyone else here have an idea what we could try apart from copying > >> >>> all the data to a new CephFS pool? > >> >>> > >> >>> > >> >>> > >> >>> Thank you! > >> >>> > >> >>> Pascal > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> root@srv02:~# ceph status > >> >>> cluster: > >> >>> id: f0d6d4d0-8c17-471a-9f95-ebc80f1fee78 > >> >>> health: HEALTH_ERR > >> >>> insufficient standby MDS daemons available > >> >>> 69262 scrub errors > >> >>> Too many repaired reads on 2 OSDs > >> >>> Possible data damage: 64 pgs inconsistent > >> >>> > >> >>> services: > >> >>> mon: 3 daemons, quorum srv02,srv03,srv01 (age 3w) > >> >>> mgr: srv03(active, since 3w), standbys: srv01, srv02 > >> >>> mds: 2/2 daemons up, 1 hot standby > >> >>> osd: 44 osds: 44 up (since 3w), 44 in (since 10M) > >> >>> > >> >>> data: > >> >>> volumes: 2/2 healthy > >> >>> pools: 13 pools, 1217 pgs > >> >>> objects: 75.72M objects, 26 TiB > >> >>> usage: 80 TiB used, 42 TiB / 122 TiB avail > >> >>> pgs: 1153 active+clean > >> >>> 55 active+clean+inconsistent > >> >>> 9 active+clean+inconsistent+failed_repair > >> >>> > >> >>> io: > >> >>> client: 2.0 MiB/s rd, 21 MiB/s wr, 240 op/s rd, 1.75k op/s wr > >> >>> > >> >>> > >> >>> { > >> >>> "epoch": 4962617, > >> >>> "inconsistents": [ > >> >>> { > >> >>> "object": { > >> >>> "name": "1000000cc8e.00000000", > >> >>> "nspace": "", > >> >>> "locator": "", > >> >>> "snap": 1, > >> >>> "version": 4253817 > >> >>> }, > >> >>> "errors": [], > >> >>> "union_shard_errors": [ > >> >>> "omap_digest_mismatch_info" > >> >>> ], > >> >>> "selected_object_info": { > >> >>> "oid": { > >> >>> "oid": "1000000cc8e.00000000", > >> >>> "key": "", > >> >>> "snapid": 1, > >> >>> "hash": 1369745244, > >> >>> "max": 0, > >> >>> "pool": 7, > >> >>> "namespace": "" > >> >>> }, > >> >>> "version": "4962847'6209730", > >> >>> "prior_version": "3916665'4306116", > >> >>> "last_reqid": "osd.27.0:757107407", > >> >>> "user_version": 4253817, > >> >>> "size": 0, > >> >>> "mtime": "2022-02-26T12:56:55.612420+0100", > >> >>> "local_mtime": "2022-02-26T12:56:55.614429+0100", > >> >>> "lost": 0, > >> >>> "flags": [ > >> >>> "dirty", > >> >>> "omap", > >> >>> "data_digest", > >> >>> "omap_digest" > >> >>> ], > >> >>> "truncate_seq": 0, > >> >>> "truncate_size": 0, > >> >>> "data_digest": "0xffffffff", > >> >>> "omap_digest": "0xe5211a9e", > >> >>> "expected_object_size": 0, > >> >>> "expected_write_size": 0, > >> >>> "alloc_hint_flags": 0, > >> >>> "manifest": { > >> >>> "type": 0 > >> >>> }, > >> >>> "watchers": {} > >> >>> }, > >> >>> "shards": [ > >> >>> { > >> >>> "osd": 20, > >> >>> "primary": false, > >> >>> "errors": [ > >> >>> "omap_digest_mismatch_info" > >> >>> ], > >> >>> "size": 0, > >> >>> "omap_digest": "0xffffffff", > >> >>> "data_digest": "0xffffffff" > >> >>> }, > >> >>> { > >> >>> "osd": 27, > >> >>> "primary": true, > >> >>> "errors": [ > >> >>> "omap_digest_mismatch_info" > >> >>> ], > >> >>> "size": 0, > >> >>> "omap_digest": "0xffffffff", > >> >>> "data_digest": "0xffffffff" > >> >>> }, > >> >>> { > >> >>> "osd": 43, > >> >>> "primary": false, > >> >>> "errors": [ > >> >>> "omap_digest_mismatch_info" > >> >>> ], > >> >>> "size": 0, > >> >>> "omap_digest": "0xffffffff", > >> >>> "data_digest": "0xffffffff" > >> >>> } > >> >>> ] > >> >>> }, > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> _______________________________________________ > >> >>> ceph-users mailing list -- ceph-users@xxxxxxx > >> >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx > >> >> _______________________________________________ > >> >> ceph-users mailing list -- ceph-users@xxxxxxx > >> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > >> > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx