Okay, I definitely need here some help. The crashing OSD moved with the PG. so The PG seems to have the issue I moved (via upmaps ) all 4 replicas to filestore OSDs. After this the error seems to be solved. No OSD crashed after this. A deep-scrub of the PG didn't throw any error. So I moved the first shard back to a bluestore OSD. This worked flawlessly as well. A deep scrub after this showed one object missing. The same which was obviously the cause of the prior crashes. repair seemed to fixed the object. But a further deep-scrub brings back the same error. Even putting the object again with rados put didn't help. now I have two "missing" objects. (the head and the snapshot from overwriting) Here the scrub error and reapair from the osd log 2022-02-08 14:04:43.751 7f600dfec700 -1 log_channel(cluster) log [ERR] : 1.7fff shard 3 1:ffffffff:::c76c7ac2014adb9f0f0837ac1e85fd1e241af225908b6a0c3d3a44d6b866e732_00400000:head : missing 2022-02-08 14:04:43.751 7f600dfec700 -1 log_channel(cluster) log [ERR] : 1.7fff deep-scrub 1 missing, 0 inconsistent objects 2022-02-08 14:04:43.751 7f600dfec700 -1 log_channel(cluster) log [ERR] : 1.7fff deep-scrub 1 errors 2022-02-08 13:52:09.111 7f600dfec700 -1 log_channel(cluster) log [ERR] : 1.7fff shard 3 1:ffffffff:::c76c7ac2014adb9f0f0837ac1e85fd1e241af225908b6a0c3d3a44d6b866e732_00400000:head : missing 2022-02-08 13:52:09.111 7f600dfec700 -1 log_channel(cluster) log [ERR] : 1.7fff repair 1 missing, 0 inconsistent objects 2022-02-08 13:52:09.111 7f600dfec700 -1 log_channel(cluster) log [ERR] : 1.7fff repair 1 errors, 1 fixed and here the new scrub error with the two missings 2022-02-08 14:19:10.990 7f600dfec700 0 log_channel(cluster) log [DBG] : 1.7fff deep-scrub starts 2022-02-08 14:25:17.749 7f600dfec700 -1 log_channel(cluster) log [ERR] : 1.7fff shard 3 1:ffffffff:::c76c7ac2014adb9f0f0837ac1e85fd1e241af225908b6a0c3d3a44d6b866e732_00400000:974 : missing 2022-02-08 14:25:17.749 7f600dfec700 -1 log_channel(cluster) log [ERR] : 1.7fff shard 3 1:ffffffff:::c76c7ac2014adb9f0f0837ac1e85fd1e241af225908b6a0c3d3a44d6b866e732_00400000:head : missing 2022-02-08 14:25:17.750 7f600dfec700 -1 log_channel(cluster) log [ERR] : 1.7fff deep-scrub 2 missing, 0 inconsistent objects 2022-02-08 14:25:17.750 7f600dfec700 -1 log_channel(cluster) log [ERR] : 1.7fff deep-scrub 2 errors Can someone help me here? I don't have any clue. Regards Manuel On Mon, 7 Feb 2022 16:51:16 +0100 Manuel Lausch <manuel.lausch@xxxxxxxx> wrote: > Hi, > > I am migrating from filestore to bluestore (workflow is draining osd, > and reformat it with bluestore) > > Now I have two OSDs which crashes to the same time with the following > error. Restarting of the OSD works for some time until they crash > again. > > -40> 2022-02-07 16:28:20.489 7f550723a700 20 bluestore(/var/lib/ceph/osd/ceph-410).collection(1.7fff_head 0x564161314600) r 0 v.len 512 > -39> 2022-02-07 16:28:20.489 7f550723a700 15 bluestore(/var/lib/ceph/osd/ceph-410) getattrs 1.7fff_head #1:ffffffeb:::9b6886fa3639e64c892813ba7c9da9f4411f0a5fb73c89517b5f3f68acdaa388_00400000:head# > -38> 2022-02-07 16:28:20.489 7f550723a700 10 bluestore(/var/lib/ceph/osd/ceph-410) getattrs 1.7fff_head #1:ffffffeb:::9b6886fa3639e64c892813ba7c9da9f4411f0a5fb73c89517b5f3f68acdaa388_00400000:head# = 0 > -37> 2022-02-07 16:28:20.489 7f550723a700 10 bluestore(/var/lib/ceph/osd/ceph-410) stat 1.7fff_head #1:ffffffef:::bda22ca861e6999694841deb44bce5d37d7c35d0ffc9387d649d80acf818c341_0014f39d:head# > -36> 2022-02-07 16:28:20.489 7f550723a700 20 bluestore(/var/lib/ceph/osd/ceph-410).collection(1.7fff_head 0x564161314600) get_onode oid #1:ffffffef:::bda22ca861e6999694841deb44bce5d37d7c35d0ffc9387d649d80acf818c341_0014f39d:head# key 0x7f8000000000000001ffffffef216264'a22ca861e6999694841deb44bce5d37d7c35d0ffc9387d649d80acf818c341_0014f39d!='0xfffffffffffffffeffffffffffffffff'o' > -35> 2022-02-07 16:28:20.489 7f550723a700 20 bluestore(/var/lib/ceph/osd/ceph-410).collection(1.7fff_head 0x564161314600) r 0 v.len 843 > -34> 2022-02-07 16:28:20.489 7f550723a700 15 bluestore(/var/lib/ceph/osd/ceph-410) getattrs 1.7fff_head #1:ffffffef:::bda22ca861e6999694841deb44bce5d37d7c35d0ffc9387d649d80acf818c341_0014f39d:head# > -33> 2022-02-07 16:28:20.489 7f550723a700 10 bluestore(/var/lib/ceph/osd/ceph-410) getattrs 1.7fff_head #1:ffffffef:::bda22ca861e6999694841deb44bce5d37d7c35d0ffc9387d649d80acf818c341_0014f39d:head# = 0 > -32> 2022-02-07 16:28:20.489 7f550723a700 10 bluestore(/var/lib/ceph/osd/ceph-410) stat 1.7fff_head #1:fffffffb:::98c8a3708cceb042f5ec0d5dd49416968adc95cf6019796fdf6ae1a1f7fd929d_00400000:head# > -31> 2022-02-07 16:28:20.489 7f550723a700 20 bluestore(/var/lib/ceph/osd/ceph-410).collection(1.7fff_head 0x564161314600) get_onode oid #1:fffffffb:::98c8a3708cceb042f5ec0d5dd49416968adc95cf6019796fdf6ae1a1f7fd929d_00400000:head# key 0x7f8000000000000001fffffffb213938'c8a3708cceb042f5ec0d5dd49416968adc95cf6019796fdf6ae1a1f7fd929d_00400000!='0xfffffffffffffffeffffffffffffffff'o' > -30> 2022-02-07 16:28:20.489 7f550723a700 20 bluestore(/var/lib/ceph/osd/ceph-410).collection(1.7fff_head 0x564161314600) r 0 v.len 512 > -29> 2022-02-07 16:28:20.489 7f550723a700 15 bluestore(/var/lib/ceph/osd/ceph-410) getattrs 1.7fff_head #1:fffffffb:::98c8a3708cceb042f5ec0d5dd49416968adc95cf6019796fdf6ae1a1f7fd929d_00400000:head# > -28> 2022-02-07 16:28:20.489 7f550723a700 10 bluestore(/var/lib/ceph/osd/ceph-410) getattrs 1.7fff_head #1:fffffffb:::98c8a3708cceb042f5ec0d5dd49416968adc95cf6019796fdf6ae1a1f7fd929d_00400000:head# = 0 > -27> 2022-02-07 16:28:20.494 7f550723a700 15 bluestore(/var/lib/ceph/osd/ceph-410) collection_list 1.7fff_head start #1:ffffffff:::c76c7ac2014adb9f0f0837ac1e85fd1e241af225908b6a0c3d3a44d6b866e732_00400000:0# end #MAX# max 2147483647 > -26> 2022-02-07 16:28:20.494 7f550723a700 20 bluestore(/var/lib/ceph/osd/ceph-410) _collection_list range #-3:fffe0000::::0#0 to #-3:ffffffff::::0#0 and #1:fffe0000::::0#0 to #1:ffffffff::::0#0 start #1:ffffffff:::c76c7ac2014adb9f0f0837ac1e85fd1e241af225908b6a0c3d3a44d6b866e732_00400000:0# > -25> 2022-02-07 16:28:20.506 7f550723a700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/14.2.22/rpm/el8/BUILD/ceph-14.2.22/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::_collection_list(BlueStore::Collection*, const ghobject_t&, const ghobject_t&, int, bool, std::vector<ghobject_t>*, ghobject_t*)' thread 7f550723a700 time 2022-02-07 16:28:20.495642 > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/14.2.22/rpm/el8/BUILD/ceph-14.2.22/src/os/bluestore/BlueStore.cc: 10157: FAILED ceph_assert(start >= coll_range_start && start < coll_range_end) > > > Please let me know if you need more loglines. > > The pool is relicated with size 4 > the two problematic OSDs are running with bluestore. the two other > replicas are running on filestore. > > > What is happening here and how can I fix it? > > > Ceph Version: 14.2.22 > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx