Re: ceph_assert(start >= coll_range_start && start < coll_range_end)

Igor Fedotov <igor.fedotov@xxxxxxxx> · Thu, 10 Feb 2022 14:56:58 +0300

Hi Manuel,

could you please elaborate a bit about the reproduction steps in 16.2.6:

1) Do you just put the object named this way with rados tool to a 
replicated pool and subseqent deep scrubs reports the error? Or some 
othe steps are present?

2) Do you have all-bluestore setup for that pacific cluster or there is 
a mixture of bluestore and file store osds?

Thanks,

Igor

On 2/10/2022 12:06 PM, Manuel Lausch wrote:
Okay. the issue is triggered with a specifc object name
->
c76c7ac2014adb9f0f0837ac1e85fd1e241af225908b6a0c3d3a44d6b866e732_00400000

And with this name I could trigger at least the scrub issues
on a ceph pacific 16.2.6 as well.

I opend a bug ticket to this issue:
https://tracker.ceph.com/issues/54226

On Tue, 8 Feb 2022 14:35:58 +0100
Manuel Lausch <manuel.lausch@xxxxxxxx> wrote:

Okay, I definitely need here some help.

The crashing OSD moved with the PG. so The PG seems to have the issue

I moved (via upmaps ) all 4 replicas to filestore OSDs. After this the
error seems to be solved. No OSD crashed after this.

A deep-scrub of the PG didn't throw any error. So I moved the first
shard back to a bluestore OSD. This worked flawlessly as well.

A deep scrub after this showed one object missing. The
same which was obviously the cause of the prior crashes.

repair seemed to fixed the object. But a further deep-scrub brings back
the same error.

Even putting the object again with rados put didn't help. now I have
two "missing" objects. (the head and the snapshot from overwriting)

Here the scrub error and reapair from the osd log
2022-02-08 14:04:43.751 7f600dfec700 -1 log_channel(cluster) log [ERR] : 1.7fff shard 3 1:ffffffff:::c76c7ac2014adb9f0f0837ac1e85fd1e241af225908b6a0c3d3a44d6b866e732_00400000:head : missing
2022-02-08 14:04:43.751 7f600dfec700 -1 log_channel(cluster) log [ERR] : 1.7fff deep-scrub 1 missing, 0 inconsistent objects
2022-02-08 14:04:43.751 7f600dfec700 -1 log_channel(cluster) log [ERR] : 1.7fff deep-scrub 1 errors

2022-02-08 13:52:09.111 7f600dfec700 -1 log_channel(cluster) log [ERR] : 1.7fff shard 3 1:ffffffff:::c76c7ac2014adb9f0f0837ac1e85fd1e241af225908b6a0c3d3a44d6b866e732_00400000:head : missing
2022-02-08 13:52:09.111 7f600dfec700 -1 log_channel(cluster) log [ERR] : 1.7fff repair 1 missing, 0 inconsistent objects
2022-02-08 13:52:09.111 7f600dfec700 -1 log_channel(cluster) log [ERR] : 1.7fff repair 1 errors, 1 fixed

and here the new scrub error with the two missings
2022-02-08 14:19:10.990 7f600dfec700  0 log_channel(cluster) log [DBG] : 1.7fff deep-scrub starts
2022-02-08 14:25:17.749 7f600dfec700 -1 log_channel(cluster) log [ERR] : 1.7fff shard 3 1:ffffffff:::c76c7ac2014adb9f0f0837ac1e85fd1e241af225908b6a0c3d3a44d6b866e732_00400000:974 : missing
2022-02-08 14:25:17.749 7f600dfec700 -1 log_channel(cluster) log [ERR] : 1.7fff shard 3 1:ffffffff:::c76c7ac2014adb9f0f0837ac1e85fd1e241af225908b6a0c3d3a44d6b866e732_00400000:head : missing
2022-02-08 14:25:17.750 7f600dfec700 -1 log_channel(cluster) log [ERR] : 1.7fff deep-scrub 2 missing, 0 inconsistent objects
2022-02-08 14:25:17.750 7f600dfec700 -1 log_channel(cluster) log [ERR] : 1.7fff deep-scrub 2 errors

Can someone help me here? I don't have any clue.

Regards
Manuel

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx