Re: Inconsistent PGs after upgrade to Pacific

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I would say yes but it would be nice if other people can confirm it too.

also can you create a test cluster and do the same tasks
* create it with octopus
* create snapshot
* reduce rank to 1
* upgrade to pacific

and then try to fix the PG, assuming that you will have the same
issues in your test-cluster,

cheers,
Ansgar

Am Do., 23. Juni 2022 um 22:12 Uhr schrieb Pascal Ehlert <pascal@xxxxxxxxxxxx>:
>
> Hi,
>
> I have now tried to "ceph osd pool rmsnap $POOL beforefixes" and it says the snapshot could not be found although I have definitely run "ceph osd pool mksnap $POOL beforefixes" about three weeks ago.
> When running rados list-inconsistent-obj $PG on one of the affected PGs, all of the objects returned have "snap" set to 1:
>
> root@srv01:~# for i in $(rados list-inconsistent-pg $POOL | jq -er .[]); do rados list-inconsistent-obj $i | jq -er .inconsistents[].object; done
> [..]
> {
>   "name": "200020744f4.00000000",
>   "nspace": "",
>   "locator": "",
>   "snap": 1,
>   "version": 5704208
> }
> {
>   "name": "200021aeb16.00000000",
>   "nspace": "",
>   "locator": "",
>   "snap": 1,
>   "version": 6189078
> }
> [..]
>
> Running listsnaps on any of them then looks like this:
>
> root@srv01:~# rados listsnaps 200020744f4.00000000 -p $POOL
> 200020744f4.00000000:
> cloneid    snaps    size    overlap
> 1    1    0    []
> head    -    0
>
>
> Is it save to assume that these objects belong to a somewhat broken snapshot and can be removed safely without causing further damage?
>
>
> Thanks,
>
> Pascal
>
>
>
> Ansgar Jazdzewski wrote on 23.06.22 20:36:
>
> Hi,
>
> we could identify the rbd images that wehre affected and did an export before, but in the case of cephfs metadata i have no plan that will work.
>
> can you try to delete the snapshot?
> also if the filesystem can be shutdown? try to do a backup of the metadatapool
>
> hope you will have some luck, let me know if I can help,
> Ansgar
>
> Pascal Ehlert <pascal@xxxxxxxxxxxx> schrieb am Do., 23. Juni 2022, 16:45:
>>
>> Hi Ansgar,
>>
>> Thank you very much for the response.
>> Running your first command to obtain inconsistent objects, I retrieve a
>> total of 23114 only some of which are snaps.
>>
>> You mentioning snapshots did remind me of the fact however that I
>> created a snapshot on the Ceph metadata pool via "ceph osd pool $POOL
>> mksnap" before I reduced the number of ranks.
>> Maybe that has causes the inconsistencies and would explain why the
>> actual file system appears unaffected?
>>
>> Is there any way to validate that theory? I am a bit hesitant to just
>> run "rmsnap". Could that cause inconsistent data to be written back to
>> the actual objects?
>>
>>
>> Best regards,
>>
>> Pascal
>>
>>
>>
>> Ansgar Jazdzewski wrote on 23.06.22 16:11:
>> > Hi Pascal,
>> >
>> > We just had a similar situation on our RBD and had found some bad data
>> > in RADOS here is How we did it:
>> >
>> > for i in $(rados list-inconsistent-pg $POOL | jq -er .[]); do rados
>> > list-inconsistent-obj $i | jq -er .inconsistents[].object.name| awk
>> > -F'.' '{print $2}'; done
>> >
>> > we than found inconsistent snaps on the Object:
>> >
>> > rados list-inconsistent-snapset $PG --format=json-pretty | jq
>> > .inconsistents[].name
>> >
>> > List the data on the OSD's (ceph pg map $PG)
>> >
>> > ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-${OSD}/ --op
>> > list ${OBJ} --pgid ${PG}
>> >
>> > and finally remove the object, like:
>> > ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-459/ --op
>> > list rbd_data.762a94d768c04d.000000000036b7ac --pgid
>> > 2.704ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-459/
>> > '["2.704",{"oid":"rbd_data.801e1d1d9c719d.0000000000044943","key":"","snapid":125458,"hash":4136961796,"max":0,"pool":2,"namespace":"","max":0}]'
>> > remove
>> >
>> > we had to do it for all OSD one after the other after this a 'pg repair' worked
>> >
>> > i hope it will help
>> > Ansgar
>> >
>> > Am Do., 23. Juni 2022 um 15:02 Uhr schrieb Dan van der Ster
>> > <dvanders@xxxxxxxxx>:
>> >> Hi Pascal,
>> >>
>> >> It's not clear to me how the upgrade procedure you described would
>> >> lead to inconsistent PGs.
>> >>
>> >> Even if you didn't record every step, do you have the ceph.log, the
>> >> mds logs, perhaps some osd logs from this time?
>> >> And which versions did you upgrade from / to ?
>> >>
>> >> Cheers, Dan
>> >>
>> >> On Wed, Jun 22, 2022 at 7:41 PM Pascal Ehlert <pascal@xxxxxxxxxxxx> wrote:
>> >>> Hi all,
>> >>>
>> >>> I am currently battling inconsistent PGs after a far-reaching mistake
>> >>> during the upgrade from Octopus to Pacific.
>> >>> While otherwise following the guide, I restarted the Ceph MDS daemons
>> >>> (and this started the Pacific daemons) without previously reducing the
>> >>> ranks to 1 (from 2).
>> >>>
>> >>> This resulted in daemons not coming up and reporting inconsistencies.
>> >>> After later reducing the ranks and bringing the MDS back up (I did not
>> >>> record every step as this was an emergency situation), we started seeing
>> >>> health errors on every scrub.
>> >>>
>> >>> Now after three weeks, while our CephFS is still working fine and we
>> >>> haven't noticed any data damage, we realized that every single PG of the
>> >>> cephfs metadata pool is affected.
>> >>> Below you can find some information on the actual status and a detailed
>> >>> inspection of one of the affected pgs. I am happy to provide any other
>> >>> information that could be useful of course.
>> >>>
>> >>> A repair of the affected PGs does not resolve the issue.
>> >>> Does anyone else here have an idea what we could try apart from copying
>> >>> all the data to a new CephFS pool?
>> >>>
>> >>>
>> >>>
>> >>> Thank you!
>> >>>
>> >>> Pascal
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> root@srv02:~# ceph status
>> >>>     cluster:
>> >>>       id:     f0d6d4d0-8c17-471a-9f95-ebc80f1fee78
>> >>>       health: HEALTH_ERR
>> >>>               insufficient standby MDS daemons available
>> >>>               69262 scrub errors
>> >>>               Too many repaired reads on 2 OSDs
>> >>>               Possible data damage: 64 pgs inconsistent
>> >>>
>> >>>     services:
>> >>>       mon: 3 daemons, quorum srv02,srv03,srv01 (age 3w)
>> >>>       mgr: srv03(active, since 3w), standbys: srv01, srv02
>> >>>       mds: 2/2 daemons up, 1 hot standby
>> >>>       osd: 44 osds: 44 up (since 3w), 44 in (since 10M)
>> >>>
>> >>>     data:
>> >>>       volumes: 2/2 healthy
>> >>>       pools:   13 pools, 1217 pgs
>> >>>       objects: 75.72M objects, 26 TiB
>> >>>       usage:   80 TiB used, 42 TiB / 122 TiB avail
>> >>>       pgs:     1153 active+clean
>> >>>                55   active+clean+inconsistent
>> >>>                9    active+clean+inconsistent+failed_repair
>> >>>
>> >>>     io:
>> >>>       client:   2.0 MiB/s rd, 21 MiB/s wr, 240 op/s rd, 1.75k op/s wr
>> >>>
>> >>>
>> >>> {
>> >>>     "epoch": 4962617,
>> >>>     "inconsistents": [
>> >>>       {
>> >>>         "object": {
>> >>>           "name": "1000000cc8e.00000000",
>> >>>           "nspace": "",
>> >>>           "locator": "",
>> >>>           "snap": 1,
>> >>>           "version": 4253817
>> >>>         },
>> >>>         "errors": [],
>> >>>         "union_shard_errors": [
>> >>>           "omap_digest_mismatch_info"
>> >>>         ],
>> >>>         "selected_object_info": {
>> >>>           "oid": {
>> >>>             "oid": "1000000cc8e.00000000",
>> >>>             "key": "",
>> >>>             "snapid": 1,
>> >>>             "hash": 1369745244,
>> >>>             "max": 0,
>> >>>             "pool": 7,
>> >>>             "namespace": ""
>> >>>           },
>> >>>           "version": "4962847'6209730",
>> >>>           "prior_version": "3916665'4306116",
>> >>>           "last_reqid": "osd.27.0:757107407",
>> >>>           "user_version": 4253817,
>> >>>           "size": 0,
>> >>>           "mtime": "2022-02-26T12:56:55.612420+0100",
>> >>>           "local_mtime": "2022-02-26T12:56:55.614429+0100",
>> >>>           "lost": 0,
>> >>>           "flags": [
>> >>>             "dirty",
>> >>>             "omap",
>> >>>             "data_digest",
>> >>>             "omap_digest"
>> >>>           ],
>> >>>           "truncate_seq": 0,
>> >>>           "truncate_size": 0,
>> >>>           "data_digest": "0xffffffff",
>> >>>           "omap_digest": "0xe5211a9e",
>> >>>           "expected_object_size": 0,
>> >>>           "expected_write_size": 0,
>> >>>           "alloc_hint_flags": 0,
>> >>>           "manifest": {
>> >>>             "type": 0
>> >>>           },
>> >>>           "watchers": {}
>> >>>         },
>> >>>         "shards": [
>> >>>           {
>> >>>             "osd": 20,
>> >>>             "primary": false,
>> >>>             "errors": [
>> >>>               "omap_digest_mismatch_info"
>> >>>             ],
>> >>>             "size": 0,
>> >>>             "omap_digest": "0xffffffff",
>> >>>             "data_digest": "0xffffffff"
>> >>>           },
>> >>>           {
>> >>>             "osd": 27,
>> >>>             "primary": true,
>> >>>             "errors": [
>> >>>               "omap_digest_mismatch_info"
>> >>>             ],
>> >>>             "size": 0,
>> >>>             "omap_digest": "0xffffffff",
>> >>>             "data_digest": "0xffffffff"
>> >>>           },
>> >>>           {
>> >>>             "osd": 43,
>> >>>             "primary": false,
>> >>>             "errors": [
>> >>>               "omap_digest_mismatch_info"
>> >>>             ],
>> >>>             "size": 0,
>> >>>             "omap_digest": "0xffffffff",
>> >>>             "data_digest": "0xffffffff"
>> >>>           }
>> >>>         ]
>> >>>       },
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> ceph-users mailing list -- ceph-users@xxxxxxx
>> >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> >> _______________________________________________
>> >> ceph-users mailing list -- ceph-users@xxxxxxx
>> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux