Hi all, I have a strange problem on an octopus latest cluster. We had a couple of SSD OSDs down for a while and brought them up today again. For some reason, these OSDs don't come up and flood the log with messages like osd.1004 2971464 failed to load OSD map for epoch 2898146, got 0 bytes These messages cycle through the same epochs over and over again. I did not really fine too much help, there is an old thread about a similar/the same problem on a home lab cluster, with new OSDs though, I believe. I couldn't really find useful information. The OSDs seem to boot fine and then end up in something like a death loop. Below some snippets from the OSD log. Any hints appreciated. Thanks and best regards, Frank After OSD start, everything looks normal up to here: 2024-10-21T17:41:39.136+0200 7fad73cf6f00 0 osd.1004 2971464 load_pgs opened 205 pgs 2024-10-21T17:41:39.140+0200 7fad73cf6f00 -1 osd.1004 2971464 log_to_monitors {default=true} 2024-10-21T17:41:39.150+0200 7fad73cf6f00 -1 osd.1004 2971464 mon_cmd_maybe_osd_create fail: 'osd.1004 has already bound to class 'fs_meta', can not reset class to 'ssd'; use 'ceph osd crush rm-device-class <id>' to remove old class first': (16) Device or resource busy 2024-10-21T17:41:39.155+0200 7fad519a3700 -1 osd.1004 2971464 failed to load OSD map for epoch 2898132, got 0 bytes 2024-10-21T17:41:39.155+0200 7fad511a2700 -1 osd.1004 2971464 failed to load OSD map for epoch 2898132, got 0 bytes 2024-10-21T17:41:39.155+0200 7fad511a2700 -1 osd.1004 2971464 failed to load OSD map for epoch 2898133, got 0 bytes 2024-10-21T17:41:39.155+0200 7fad511a2700 -1 osd.1004 2971464 failed to load OSD map for epoch 2898134, got 0 bytes 2024-10-21T17:41:39.155+0200 7fad511a2700 -1 osd.1004 2971464 failed to load OSD map for epoch 2898135, got 0 bytes 2024-10-21T17:41:39.155+0200 7fad511a2700 -1 osd.1004 2971464 failed to load OSD map for epoch 2898136, got 0 bytes 2024-10-21T17:41:39.155+0200 7fad4f99f700 -1 osd.1004 2971464 failed to load OSD map for epoch 2898132, got 0 bytes 2024-10-21T17:41:39.155+0200 7fad4f99f700 -1 osd.1004 2971464 failed to load OSD map for epoch 2898133, got 0 bytes 2024-10-21T17:41:39.155+0200 7fad4b196700 -1 osd.1004 2971464 failed to load OSD map for epoch 2898132, got 0 bytes 2024-10-21T17:41:39.155+0200 7fad4b196700 -1 osd.1004 2971464 failed to load OSD map for epoch 2898133, got 0 bytes 2024-10-21T17:41:39.155+0200 7fad4b196700 -1 osd.1004 2971464 failed to load OSD map for epoch 2898134, got 0 bytes 2024-10-21T17:41:39.155+0200 7fad4b196700 -1 osd.1004 2971464 failed to load OSD map for epoch 2898135, got 0 bytes 2024-10-21T17:41:39.155+0200 7fad4b196700 -1 osd.1004 2971464 failed to load OSD map for epoch 2898136, got 0 bytes 2024-10-21T17:41:39.155+0200 7fad4b196700 -1 osd.1004 2971464 failed to load OSD map for epoch 2898137, got 0 bytes 2024-10-21T17:41:39.155+0200 7fad73cf6f00 0 osd.1004 2971464 done with init, starting boot process 2024-10-21T17:41:39.155+0200 7fad73cf6f00 1 osd.1004 2971464 start_boot 2024-10-21T17:41:39.155+0200 7fad4b196700 -1 osd.1004 2971464 failed to load OSD map for epoch 2898138, got 0 bytes 2024-10-21T17:41:39.155+0200 7fad4b196700 -1 osd.1004 2971464 failed to load OSD map for epoch 2898139, got 0 bytes 2024-10-21T17:41:39.155+0200 7fad4b196700 -1 osd.1004 2971464 failed to load OSD map for epoch 2898140, got 0 bytes 2024-10-21T17:41:39.155+0200 7fad4b196700 -1 osd.1004 2971464 failed to load OSD map for epoch 2898141, got 0 bytes 2024-10-21T17:41:39.155+0200 7fad4b196700 -1 osd.1004 2971464 failed to load OSD map for epoch 2898142, got 0 bytes 2024-10-21T17:41:39.156+0200 7fad4b196700 -1 osd.1004 2971464 failed to load OSD map for epoch 2898143, got 0 bytes 2024-10-21T17:41:39.156+0200 7fad4b196700 -1 osd.1004 2971464 failed to load OSD map for epoch 2898144, got 0 bytes 2024-10-21T17:41:39.156+0200 7fad4b196700 -1 osd.1004 2971464 failed to load OSD map for epoch 2898145, got 0 bytes 2024-10-21T17:41:39.156+0200 7fad4b196700 -1 osd.1004 2971464 failed to load OSD map for epoch 2898146, got 0 bytes These messages repeat over and over again with some others of this form showing up every now and then: 2024-10-21T17:41:39.476+0200 7fad651ca700 4 rocksdb: [db/compaction_job.cc:1332] [default] [JOB 12] Generated table #82879: 76571 keys, 67866714 bytes 2024-10-21T17:41:39.688+0200 7fad651ca700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1729525299690000, "cf_name": "default", "job": 12, "event": "table_file_creation", "file_number": 82879, "file_size": 67866714, "table_properties": {"data_size": 67111697, "index_size": 562601, "filter_size": 191557, "raw_key_size": 4823973, "raw_average_key_size": 63, "raw_value_size": 62631087, "raw_average_value_size": 817, "num_data_blocks": 15644, "num_entries": 76571, "filter_policy_name": "rocksdb.BuiltinBloomFilter"}} And another occasion: 2024-10-21T17:41:40.520+0200 7fad651ca700 4 rocksdb: [db/compaction_job.cc:1332] [default] [JOB 12] Generated table #82880: 76774 keys, 67868330 bytes 2024-10-21T17:41:40.520+0200 7fad501a0700 -1 osd.1004 2971464 failed to load OSD map for epoch 2899234, got 0 bytes 2024-10-21T17:41:40.520+0200 7fad501a0700 -1 osd.1004 2971464 failed to load OSD map for epoch 2899235, got 0 bytes 2024-10-21T17:41:40.520+0200 7fad501a0700 -1 osd.1004 2971464 failed to load OSD map for epoch 2899236, got 0 bytes 2024-10-21T17:41:40.520+0200 7fad651ca700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1729525300521403, "cf_name": "default", "job": 12, "event": "table_file_creation", "file_number": 82880, "file_size": 67868330, "table_properties": {"data_size": 67113021, "index_size": 562509, "filter_size": 191941, "raw_key_size": 4836742, "raw_average_key_size": 62, "raw_value_size": 62623274, "raw_average_value_size": 815, "num_data_blocks": 15630, "num_entries": 76774, "filter_policy_name": "rocksdb.BuiltinBloomFilter"}} ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx