Ceph OSD's take 10+ minutes to start on reboot

Chris Page <sirhc.page@xxxxxxxxx> · Wed, 16 Mar 2022 13:58:59 +0000

Hi,

I'm having an issue on one of my nodes where all of it's OSD's take a long
time to come back online (between 10 and 15 minutes). In the Ceph log, it
sits on:

bluestore(/var/lib/ceph/osd/ceph-8) _open_db_and_around read-only:0 repair:0

Until eventually something changes which allows the start to continue. This
happens for every OSD on the node. Here's a full sample log -

2022-03-16T13:39:56.591+0000 7fbc7ace5f00  0 load: jerasure load: lrc load:
isa
2022-03-16T13:39:57.207+0000 7fbc7ace5f00  0 osd.8:0.OSDShard using op
scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, cutoff=196)
2022-03-16T13:39:57.527+0000 7fbc7ace5f00  0 osd.8:1.OSDShard using op
scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, cutoff=196)
2022-03-16T13:39:57.847+0000 7fbc7ace5f00  0 osd.8:2.OSDShard using op
scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, cutoff=196)
2022-03-16T13:39:58.159+0000 7fbc7ace5f00  0 osd.8:3.OSDShard using op
scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, cutoff=196)
2022-03-16T13:39:58.479+0000 7fbc7ace5f00  0 osd.8:4.OSDShard using op
scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, cutoff=196)
2022-03-16T13:39:58.803+0000 7fbc7ace5f00  0 osd.8:5.OSDShard using op
scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, cutoff=196)
2022-03-16T13:39:59.131+0000 7fbc7ace5f00  0 osd.8:6.OSDShard using op
scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, cutoff=196)
2022-03-16T13:39:59.443+0000 7fbc7ace5f00  0 osd.8:7.OSDShard using op
scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, cutoff=196)
2022-03-16T13:39:59.443+0000 7fbc7ace5f00  0
bluestore(/var/lib/ceph/osd/ceph-8) _open_db_and_around read-only:0 repair:0
2022-03-16T13:52:51.108+0000 7fbc7ace5f00  0 _get_class not permitted to
load sdk
2022-03-16T13:52:51.112+0000 7fbc7ace5f00  0 _get_class not permitted to
load lua
2022-03-16T13:52:51.116+0000 7fbc7ace5f00  0 _get_class not permitted to
load kvs
2022-03-16T13:52:51.116+0000 7fbc7ace5f00  0 <cls>
./src/cls/hello/cls_hello.cc:316: loading cls_hello
2022-03-16T13:52:51.116+0000 7fbc7ace5f00  0 <cls>
./src/cls/cephfs/cls_cephfs.cc:201: loading cephfs
2022-03-16T13:52:51.116+0000 7fbc7ace5f00  0 osd.8 2537 crush map has
features 432629239337189376, adjusting msgr requires for clients
2022-03-16T13:52:51.116+0000 7fbc7ace5f00  0 osd.8 2537 crush map has
features 432629239337189376 was 8705, adjusting msgr requires for mons
2022-03-16T13:52:51.116+0000 7fbc7ace5f00  0 osd.8 2537 crush map has
features 3314933000854323200, adjusting msgr requires for osds
2022-03-16T13:52:52.064+0000 7fbc7ace5f00  0 osd.8 2537 load_pgs
2022-03-16T13:52:56.908+0000 7fbc7ace5f00  0 osd.8 2537 load_pgs opened 141
pgs
2022-03-16T13:52:56.912+0000 7fbc7ace5f00 -1 osd.8 2537 log_to_monitors
{default=true}
2022-03-16T13:52:57.477+0000 7fbc7ace5f00  0 osd.8 2537 done with init,
starting boot process
2022-03-16T13:52:57.481+0000 7fbc731ee700 -1 osd.8 2537 set_numa_affinity
unable to identify public interface '' numa node: (2) No such file or
directory

I hope someone can help?

Thanks,
Chris.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx