Re: Difficulty with fixing an inconsistent PG/object

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi!

Just try to Google data_digest_mismatch_oi
On old maillist archives couple of threads with same problem


k
Sent from my iPhone

> On 29 Jun 2022, at 13:54, Lennart van Gijtenbeek | Routz <lennart.vangijtenbeek@xxxxxxxx> wrote:
> 
> Hello Ceph community,
> 
> 
> I hope you could help me with an issue we are experiencing on our backup cluster.
> 
> The Ceph version we are running here is 10.2.10 (Jewel), and we are using Filestore.
> The PG is part of a replicated pool with size=2.
> 
> 
> Getting the following error:
> ```
> 
> root@cephmon0:~# ceph health detail
> HEALTH_ERR 1 pgs inconsistent; 2 scrub errors
> pg 37.189 is active+clean+inconsistent, acting [144,170]
> 2 scrub errors
> ```
> 
> ```
> root@cephmon0:~# grep 37.189 /var/log/ceph/ceph.log
> 2022-06-29 11:11:27.782920 osd.144 10.129.160.22:6800/2810 7598 : cluster [INF] osd.144 pg 37.189 Deep scrub errors, upgrading scrub to deep-scrub
> 2022-06-29 11:11:27.884628 osd.144 10.129.160.22:6800/2810 7599 : cluster [INF] 37.189 deep-scrub starts
> 2022-06-29 11:13:07.124841 osd.144 10.129.160.22:6800/2810 7600 : cluster [ERR] 37.189 shard 144: soid 37:9193d307:::isqPpJMKYY4.000000000000001e:head data_digest 0x50007bd9 != data_digest 0x885fabcc from auth oi 37:9193d307:::isqPpJMKYY4.000000000000001e:head(7211'173457 osd.71.0:397191 dirty|data_digest|omap_digest s 4194304 uv 39699 dd 885fabcc od ffffffff alloc_hint [0 0])
> 2022-06-29 11:13:07.124849 osd.144 10.129.160.22:6800/2810 7601 : cluster [ERR] 37.189 shard 170: soid 37:9193d307:::isqPpJMKYY4.000000000000001e:head data_digest 0x50007bd9 != data_digest 0x885fabcc from auth oi 37:9193d307:::isqPpJMKYY4.000000000000001e:head(7211'173457 osd.71.0:397191 dirty|data_digest|omap_digest s 4194304 uv 39699 dd 885fabcc od ffffffff alloc_hint [0 0])
> 2022-06-29 11:13:07.124853 osd.144 10.129.160.22:6800/2810 7602 : cluster [ERR] 37.189 soid 37:9193d307:::isqPpJMKYY4.000000000000001e:head: failed to pick suitable auth object
> 2022-06-29 11:20:46.459906 osd.144 10.129.160.22:6800/2810 7603 : cluster [ERR] 37.189 deep-scrub 2 errors
> ```
> 
> The PG has already been transferred from 2 other OSDs. That is, the same error occurred when the PG was stored on two different OSDs. So it seems this is not a disk issue. There seems to be something wrong with the object "isqPpJMKYY4.000000000000001e".
> However, when looking at the md5sum for the object. On both OSDs, this is the same.
> 
> 
> ```
> 
> root@ceph12:/var/lib/ceph/osd/ceph-144/current/37.189_head/DIR_9/DIR_8/DIR_9/DIR_C# ls -l isqPpJMKYY4.000000000000001e__head_E0CBC989__25
> 
> -rw-r--r-- 1 ceph ceph 4194304 Jun  3 09:56 isqPpJMKYY4.000000000000001e__head_E0CBC989__25
> 
> root@ceph12:/var/lib/ceph/osd/ceph-144/current/37.189_head/DIR_9/DIR_8/DIR_9/DIR_C# md5sum isqPpJMKYY4.000000000000001e__head_E0CBC989__25
> 96d702072cd441f2d0af60783e8db248  isqPpJMKYY4.000000000000001e__head_E0CBC989__25
> ```
> 
> ```
> root@ceph15:/var/lib/ceph/osd/ceph-170/current/37.189_head/DIR_9/DIR_8/DIR_9/DIR_C# ls -l isqPpJMKYY4.000000000000001e__head_E0CBC989__25
> -rw-r--r-- 1 ceph ceph 4194304 Jun 23 16:41 isqPpJMKYY4.000000000000001e__head_E0CBC989__25
> 
> root@ceph15:/var/lib/ceph/osd/ceph-170/current/37.189_head/DIR_9/DIR_8/DIR_9/DIR_C# md5sum isqPpJMKYY4.000000000000001e__head_E0CBC989__25
> 96d702072cd441f2d0af60783e8db248  isqPpJMKYY4.000000000000001e__head_E0CBC989__25
> ```
> 
> ```
> root@cephmon0:~# rados list-inconsistent-obj 37.189 --format=json-pretty
> {
>    "epoch": 167653,
>    "inconsistents": [
>        {
>            "object": {
>                "name": "isqPpJMKYY4.000000000000001e",
>                "nspace": "",
>                "locator": "",
>                "snap": "head",
>                "version": 39699
>            },
>            "errors": [],
>            "union_shard_errors": [
>                "data_digest_mismatch_oi"
>            ],
>            "selected_object_info": "37:9193d307:::isqPpJMKYY4.000000000000001e:head(7211'173457 osd.71.0:397191 dirty|data_digest|omap_digest s 4194304 uv 39699 dd 885fabcc od ffffffff alloc_hint [0 0])",
>            "shards": [
>                {
>                    "osd": 144,
>                    "errors": [
>                        "data_digest_mismatch_oi"
>                    ],
>                    "size": 4194304,
>                    "omap_digest": "0xffffffff",
>                    "data_digest": "0x50007bd9"
>                },
>                {
>                    "osd": 170,
>                    "errors": [
>                        "data_digest_mismatch_oi"
>                    ],
>                    "size": 4194304,
>                    "omap_digest": "0xffffffff",
>                    "data_digest": "0x50007bd9"
>                }
>            ]
>        }
>    ]
> }
> ```
> 
> I don't understand where there is a "data_digest_mismatch_oi" error. Since the checksums seem to match.
> 
> Does anyone have any idea on how to fix this?
> Your input would be very much appreciated. Please let me know if you need additional info.
> 
> Thank you.
> 
> Best regards,
> Lennart van Gijtenbeek
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux