Hello everybody! I am fascinated by Ceph, but now I live in moments of terror and despair. I'm using ceph 18.2.2 (reef) and at the moment We need to import 4 OSDs from an old cluster (which was removed by accident). In short, we suspect that the cause and solution for this case lies in the information in the OSD log below: (all OSD daemons repeat the same information) After importing I receive the following information in the "journalctl -u ceph-osd@2" log: --- Jul 17 20:09:55 pxm3 ceph-osd[647313]: in thread 702342c006c0 thread_name:ms_dispatch Jul 17 20:09:55 pxm3 ceph-osd[647313]: ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable) Jul 17 20:09:55 pxm3 ceph-osd[647313]: 1: /lib/x86_64-linux-gnu/libc.so.6(+0x3c050) [0x70235785b050] Jul 17 20:09:55 pxm3 ceph-osd[647313]: 2: /lib/x86_64-linux-gnu/libc.so.6(+0x8ae2c) [0x7023578a9e2c] Jul 17 20:09:55 pxm3 ceph-osd[647313]: 3: gsignal() Jul 17 20:09:55 pxm3 ceph-osd[647313]: 4: abort() Jul 17 20:09:55 pxm3 ceph-osd[647313]: 5: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x18a) [0x62edaeab47> Jul 17 20:09:55 pxm3 ceph-osd[647313]: 6: (OSD::handle_osd_map(MOSDMap*)+0x384a) [0x62edaec0aeca] Jul 17 20:09:55 pxm3 ceph-osd[647313]: 7: (OSD::ms_dispatch(Message*)+0x62) [0x62edaec0b332] Jul 17 20:09:55 pxm3 ceph-osd[647313]: 8: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0xc1) [0x62edaf5eea51] Jul 17 20:09:55 pxm3 ceph-osd[647313]: 9: (DispatchQueue::entry()+0x6cf) [0x62edaf5ed53f] Jul 17 20:09:55 pxm3 ceph-osd[647313]: 10: (DispatchQueue::DispatchThread::entry()+0xd) [0x62edaf40fd5d] Jul 17 20:09:55 pxm3 ceph-osd[647313]: 11: /lib/x86_64-linux-gnu/libc.so.6(+0x89134) [0x7023578a8134] Jul 17 20:09:55 pxm3 ceph-osd[647313]: 12: /lib/x86_64-linux-gnu/libc.so.6(+0x1097dc) [0x7023579287dc] Jul 17 20:09:55 pxm3 ceph-osd[647313]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Jul 17 20:09:55 pxm3 ceph-osd[647313]: -142> 2024-07-17T20:09:54.845-0300 702356d616c0 -1 osd.2 71 log_to_monitors true Jul 17 20:09:55 pxm3 ceph-osd[647313]: -2> 2024-07-17T20:09:55.362-0300 702342c006c0 -1 osd.2 71 ERROR: bad fsid? i have 5514a69a-46ba-4a44-bb56-8d3109c6c9e0 and inc has f4466e33-b57d-4d68-9909-346> Jul 17 20:09:55 pxm3 ceph-osd[647313]: -1> 2024-07-17T20:09:55.366-0300 702342c006c0 -1 ./src/osd/OSD.cc: In function 'void OSD::handle_osd_map(MOSDMap*)' thread 702342c006c0 time 2024-07-17T20:09:5> Jul 17 20:09:55 pxm3 ceph-osd[647313]: ./src/osd/OSD.cc: 8098: ceph_abort_msg("bad fsid") Jul 17 20:09:55 pxm3 ceph-osd[647313]: ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable) --- Above in the OSDs daemon log, we see the "bad fsid" error which indicates that the OSDs are trying to connect to a different Ceph cluster than the one they were initially configured for. Each Ceph cluster has a unique identifier, the FSID, and if the OSDs detect a different FSID, they cannot connect correctly. Note: I believe this is the root cause of the problem and the solution. Problem: OSD has fsid from the old cluster. This would be the solution, change the ceph_fsid (fsid) of the OSDs to the fsid of the new cluster, or recreate a new cluster using the fsid that is recorded in the OSD metadata, that is, recreate the cluster with the fsid of the old cluster? would it be possible?) I also opened a topic for this case on the proxmox forum and inserted a lot of data about this scenario there: https://forum.proxmox.com/threads/ceph-cluster-rebuild-import-bluestore-osds-from-old-cluster-bad-fsid-osd-dont-start-he-only-stays-in-down-state.151349/ ----- ~# cat /etc/ceph/ceph.conf [global] auth_client_required = cephx auth_cluster_required = cephx auth_service_required = cephx cluster_network = 192.168.0.2/24 fsid = f4466e33-b57d-4d68-9909-3468afd9e5c2 mon_allow_pool_delete = true mon_host = 192.168.0.2 192.168.0.3 192.168.0.1 ms_bind_ipv4 = true ms_bind_ipv6 = false osd_pool_default_min_size = 2 osd_pool_default_size = 3 public_network = 192.168.0.0/24 [client] # keyring = /etc/pve/priv/$cluster.$name.keyring keyring = /etc/pve/priv/ceph.client.admin.keyring #[client.crash] # keyring = /etc/pve/ceph/$cluster.$name.keyring [client.crash] key = AQAl95NmlvL0HRAAovpivsfHqqokmO0vqIR5Lg== [client.admin] key = AQAk95NmSjMdORAAiAHkTSSMquKkBAGpALjwQA== caps mds = "allow *" caps mgr = "allow *" caps mon = "allow *" caps osd = "allow *" [mon.pxm1] public_addr = 192.168.0.1 [mon.pxm2] public_addr = 192.168.0.2 [mon.pxm3] public_addr = 192.168.0.3 --- root@pxm3:~# ceph fsid f4466e33-b57d-4d68-9909-3468afd9e5c2 --- ~# ceph -s cluster: id: f4466e33-b57d-4d68-9909-3468afd9e5c2 health: HEALTH_WARN mon pxm1 is low on available space 4 osds down 2 hosts (4 osds) down 1 root (4 osds) down 170 daemons have recently crashed services: mon: 3 daemons, quorum pxm2,pxm3,pxm1 (age 3h) mgr: pxm2(active, since 3h), standbys: pxm1, pxm3 osd: 4 osds: 0 up, 4 in (since 2h) data: pools: 0 pools, 0 pgs objects: 0 objects, 0 B usage: 0 B used, 0 B / 0 B avail pgs: --- ~# cephadm ls [ { "style": "legacy", "name": "osd.0", "fsid": "5514a69a-46ba-4a44-bb56-8d3109c6c9e0", "systemd_unit": "ceph-osd@0", "enabled": true, "state": "error", "host_version": "18.2.2" }, { "style": "legacy", "name": "osd.1", "fsid": "5514a69a-46ba-4a44-bb56-8d3109c6c9e0", "systemd_unit": "ceph-osd@1", "enabled": true, "state": "error", "host_version": "18.2.2" }, { "style": "legacy", "name": "mon.pxm2", "fsid": "f4466e33-b57d-4d68-9909-3468afd9e5c2", "systemd_unit": "ceph-mon@pxm2", "enabled": true, "state": "running", "host_version": "18.2.2" }, { "style": "legacy", "name": "mgr.pxm2", "fsid": "f4466e33-b57d-4d68-9909-3468afd9e5c2", "systemd_unit": "ceph-mgr@pxm2", "enabled": true, "state": "running", "host_version": "18.2.2" } ] root@pxm3:~# cephadm ls [ { "style": "legacy", "name": "osd.3", "fsid": "5514a69a-46ba-4a44-bb56-8d3109c6c9e0", "systemd_unit": "ceph-osd@3", "enabled": true, "state": "error", "host_version": "18.2.2" }, { "style": "legacy", "name": "osd.2", "fsid": "5514a69a-46ba-4a44-bb56-8d3109c6c9e0", "systemd_unit": "ceph-osd@2", "enabled": true, "state": "error", "host_version": "18.2.2" }, { "style": "legacy", "name": "mon.pxm3", "fsid": "f4466e33-b57d-4d68-9909-3468afd9e5c2", "systemd_unit": "ceph-mon@pxm3", "enabled": true, "state": "running", "host_version": "18.2.2" }, { "style": "legacy", "name": "mgr.pxm3", "fsid": "f4466e33-b57d-4d68-9909-3468afd9e5c2", "systemd_unit": "ceph-mgr@pxm3", "enabled": true, "state": "running", "host_version": "18.2.2" } ] --- To import the OSDs, we first increased the epoch of the new cluster, repeating the commands below until the cluster's epoch count number was greater than the OSDs' epoch number: Bash: ceph osd set noin ceph osd set noout ceph osd set noup ceph osd set nodown ceph osd set norebalance ceph osd set nobackfill ceph osd unset noin ceph osd unset noout ceph osd unset noup ceph osd unset nodown ceph osd unset norebalance ceph osd unset nobackfill --- OSD volumes: Code: pxm2: -> OSD.0: /dev/ceph-1740d41a-2ae7-4c4d-820f-ec3702e3ba90/osd-block-39f9b32f-c6e7-4b3f-b7f0-9b11a5832aaa -> OSD.1: /dev/ceph-ad425d70-4aa3-419a-997f-f3a4082c9904/osd-block-bb4df480-2b9b-4604-a44d-6151d5c0cb33 pxm3: -> OSD.2: /dev/ceph-94682b88-d09c-4eab-9170-c6d31eac79e6/osd-block-3f6756d6-e64b-4c60-9ac2-305c0e71cc51 -> OSD.3: /dev/ceph-d5ffd027-8289-4a1c-9378-6687d9f950ad/osd-block-eece9fc9-44d6-460b-aced-572c79a98be8 --- ceph-bluestore-tool show-label: -> osb.0: Bash: root@pxm2:~# ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-0 inferring bluefs devices from bluestore path { "/var/lib/ceph/osd/ceph-0/block": { "osd_uuid": "39f9b32f-c6e7-4b3f-b7f0-9b11a5832aaa", "size": 2000397795328, "btime": "2024-07-07T21:32:30.861509-0300", "description": "main", "bfm_blocks": "488378368", "bfm_blocks_per_key": "128", "bfm_bytes_per_block": "4096", "bfm_size": "2000397795328", "bluefs": "1", "ceph_fsid": "5514a69a-46ba-4a44-bb56-8d3109c6c9e0", "ceph_version_when_created": "ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)", "created_at": "2024-07-08T00:32:32.459102Z", "kv_backend": "rocksdb", "magic": "ceph osd volume v026", "mkfs_done": "yes", "osd_key": "AQCdM4tmHR5tLRAA5AikqvQyMqOoH5MnL8Qdtg==", "ready": "ready", "require_osd_release": "18", "whoami": "0" } } -> osd.1: Bash: root@pxm2:~# ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-1 inferring bluefs devices from bluestore path { "/var/lib/ceph/osd/ceph-1/block": { "osd_uuid": "bb4df480-2b9b-4604-a44d-6151d5c0cb33", "size": 2000397795328, "btime": "2024-07-07T21:32:43.729638-0300", "description": "main", "bfm_blocks": "488378368", "bfm_blocks_per_key": "128", "bfm_bytes_per_block": "4096", "bfm_size": "2000397795328", "bluefs": "1", "ceph_fsid": "5514a69a-46ba-4a44-bb56-8d3109c6c9e0", "ceph_version_when_created": "ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)", "created_at": "2024-07-08T00:32:45.577456Z", "kv_backend": "rocksdb", "magic": "ceph osd volume v026", "mkfs_done": "yes", "osd_key": "AQCqM4tmTy87JxAAJVK1NokBDjdKSe+Z8OjwMA==", "ready": "ready", "require_osd_release": "18", "whoami": "1" } } -> osd.2 Bash: root@pxm3:~# ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-2 inferring bluefs devices from bluestore path { "/var/lib/ceph/osd/ceph-2/block": { "osd_uuid": "3f6756d6-e64b-4c60-9ac2-305c0e71cc51", "size": 2000397795328, "btime": "2024-07-07T21:33:07.812888-0300", "description": "main", "bfm_blocks": "488378368", "bfm_blocks_per_key": "128", "bfm_bytes_per_block": "4096", "bfm_size": "2000397795328", "bluefs": "1", "ceph_fsid": "5514a69a-46ba-4a44-bb56-8d3109c6c9e0", "ceph_version_when_created": "ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)", "created_at": "2024-07-08T00:33:09.404317Z", "kv_backend": "rocksdb", "magic": "ceph osd volume v026", "mkfs_done": "yes", "osd_key": "AQDCM4tmu++vLRAAuJOXjTuEHR9VsKz7ShVEPg==", "ready": "ready", "require_osd_release": "18", "whoami": "2" } } -> osd.3 Bash: root@pxm3:~# ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-3 inferring bluefs devices from bluestore path { "/var/lib/ceph/osd/ceph-3/block": { "osd_uuid": "eece9fc9-44d6-460b-aced-572c79a98be8", "size": 2000397795328, "btime": "2024-07-07T21:33:25.725294-0300", "description": "main", "bfm_blocks": "488378368", "bfm_blocks_per_key": "128", "bfm_bytes_per_block": "4096", "bfm_size": "2000397795328", "bluefs": "1", "ceph_fsid": "5514a69a-46ba-4a44-bb56-8d3109c6c9e0", "ceph_version_when_created": "ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)", "created_at": "2024-07-08T00:33:27.323085Z", "kv_backend": "rocksdb", "magic": "ceph osd volume v026", "mkfs_done": "yes", "osd_key": "AQDUM4tmagOEKBAAeAfZXcyU1naRkqIE5iVOfw==", "ready": "ready", "require_osd_release": "18", "whoami": "3" } } --- root@pxm2:~# cat /var/lib/ceph/osd/ceph-0/ceph_fsid 5514a69a-46ba-4a44-bb56-8d3109c6c9e0 root@pxm2:~# cat /var/lib/ceph/osd/ceph-1/ceph_fsid 5514a69a-46ba-4a44-bb56-8d3109c6c9e0 root@pxm3:~# cat /var/lib/ceph/osd/ceph-2/ceph_fsid 5514a69a-46ba-4a44-bb56-8d3109c6c9e0 root@pxm3:~# cat /var/lib/ceph/osd/ceph-3/ceph_fsid 5514a69a-46ba-4a44-bb56-8d3109c6c9e0 --- ~# ceph daemon osd.0 status no valid command found; 10 closest matches: 0 1 2 abort assert bluefs debug_inject_read_zeros bluefs files list bluefs stats bluestore allocator dump block bluestore allocator fragmentation block admin_socket: invalid command --- ~# ceph osd info osd.0 osd.0 down in weight 1 up_from 0 up_thru 0 down_at 0 last_clean_interval [0,0) exists 39f9b32f-c6e7-4b3f-b7f0-9b11a5832aaa --- ~# ceph osd status ID HOST USED AVAIL WR OPS WR DATA RD OPS RD DATA STATE 0 0 0 0 0 0 0 exists 1 0 0 0 0 0 0 exists 2 0 0 0 0 0 0 exists 3 0 0 0 0 0 0 exists root@pxm3:~# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 7.27759 root default -2 3.63879 host pxm2 0 1.81940 osd.0 down 1.00000 1.00000 1 1.81940 osd.1 down 1.00000 1.00000 -5 3.63879 host pxm3 3 1.81940 osd.3 down 1.00000 1.00000 2 ssd 1.81940 osd.2 down 1.00000 1.00000 --- ~# ceph osd dump epoch 164 fsid f4466e33-b57d-4d68-9909-3468afd9e5c2 created 2024-07-14T13:04:53.354056-0300 modified 2024-07-17T20:24:17.550123-0300 flags sortbitwise,recovery_deletes,purged_snapdirs,pglog_hardlimit crush_version 63 full_ratio 0.95 backfillfull_ratio 0.9 nearfull_ratio 0.85 require_min_compat_client luminous min_compat_client jewel require_osd_release reef stretch_mode_enabled false max_osd 4 osd.0 down in weight 1 up_from 0 up_thru 0 down_at 0 last_clean_interval [0,0) exists 39f9b32f-c6e7-4b3f-b7f0-9b11a5832aaa osd.1 down in weight 1 up_from 0 up_thru 0 down_at 0 last_clean_interval [0,0) exists bb4df480-2b9b-4604-a44d-6151d5c0cb33 osd.2 down in weight 1 up_from 0 up_thru 0 down_at 0 last_clean_interval [0,0) exists 3f6756d6-e64b-4c60-9ac2-305c0e71cc51 osd.3 down in weight 1 up_from 0 up_thru 0 down_at 0 last_clean_interval [0,0) exists eece9fc9-44d6-460b-aced-572c79a98be8 blocklist 192.168.0.1:0/1377959162 expires 2024-07-18T17:16:21.635340-0300 blocklist 192.168.0.1:6800/49511 expires 2024-07-18T17:16:21.635340-0300 blocklist 192.168.0.1:0/3150227222 expires 2024-07-18T17:16:21.635340-0300 blocklist 192.168.0.2:0/1759809784 expires 2024-07-18T00:56:24.333969-0300 blocklist 192.168.0.1:0/3747372149 expires 2024-07-18T17:16:21.635340-0300 blocklist 192.168.0.1:6801/49511 expires 2024-07-18T17:16:21.635340-0300 blocklist 192.168.0.2:0/1472036822 expires 2024-07-18T00:56:24.333969-0300 blocklist 192.168.0.2:0/571633622 expires 2024-07-18T00:56:24.333969-0300 blocklist 192.168.0.2:6801/10543 expires 2024-07-18T00:56:24.333969-0300 blocklist 192.168.0.2:6800/10543 expires 2024-07-18T00:56:24.333969-0300 --- ~# ceph osd metadata [ { "id": 0 }, { "id": 1 }, { "id": 2 }, { "id": 3 } ] --- ~# cat /var/lib/ceph/osd/ceph-0/keyring [osd.0] key = AQCdM4tmHR5tLRAA5AikqvQyMqOoH5MnL8Qdtg== root@pxm2:~# cat /var/lib/ceph/osd/ceph-1/keyring [osd.1] key = AQCqM4tmTy87JxAAJVK1NokBDjdKSe+Z8OjwMA== root@pxm3:~# cat /var/lib/ceph/osd/ceph-2/keyring [osd.2] key = AQDCM4tmu++vLRAAuJOXjTuEHR9VsKz7ShVEPg== root@pxm3:~# cat /var/lib/ceph/osd/ceph-3/keyring [osd.3] key = AQDUM4tmagOEKBAAeAfZXcyU1naRkqIE5iVOfw== --- ~# ceph auth list osd.0 key: AQCKQJhmMYWeEhAAosrE8Ff+1kZbKcroi22TvQ== caps: [mon] allow profile osd caps: [osd] allow * osd.1 key: AQCTQJhm2uLvAxAAC3uIcRk9d0sxLJgIxcivtw== caps: [mon] allow profile osd caps: [osd] allow * osd.2 key: AQDCM4tmu++vLRAAuJOXjTuEHR9VsKz7ShVEPg== caps: [mon] allow profile osd caps: [osd] allow * osd.3 key: AQBYMZhmmlaFDRAAtoy4//XweaI94OvrPV1aiQ== caps: [mon] allow profile osd caps: [osd] allow * client.admin key: AQAk95NmSjMdORAAiAHkTSSMquKkBAGpALjwQA== caps: [mds] allow * caps: [mgr] allow * caps: [mon] allow * caps: [osd] allow * client.bootstrap-mds key: AQAl95NmWuofFRAAhV7/f1PykW/KlcB8Rede9w== caps: [mon] allow profile bootstrap-mds client.bootstrap-mgr key: AQAl95NmmPQfFRAAG2cQNiWQgdx0Rvr7ZFmuhg== caps: [mon] allow profile bootstrap-mgr client.bootstrap-osd key: AQAl95Nmr/0fFRAA4TahG5PCbZIsltFgkRNEgA== caps: [mon] allow profile bootstrap-osd client.bootstrap-rbd key: AQAl95Nm0gYgFRAAWBvqfphKk62InqY9x0ijHg== caps: [mon] allow profile bootstrap-rbd client.bootstrap-rbd-mirror key: AQAl95Nmxg8gFRAAomFkp299Ca04NwGpfbSZRg== caps: [mon] allow profile bootstrap-rbd-mirror client.bootstrap-rgw key: AQAl95Nm3RwgFRAA/QoIemXPFv5Gs1/PWfJuYw== caps: [mon] allow profile bootstrap-rgw client.crash key: AQAl95NmlvL0HRAAovpivsfHqqokmO0vqIR5Lg== caps: [mgr] profile crash caps: [mon] profile crash mgr.pxm1 key: AQDm+JNmtJ3CCRAAl/wZaT6z12LCfAghvm6s4w== caps: [mds] allow * caps: [mon] allow profile mgr caps: [osd] allow * mgr.pxm2 key: AQAm95Nm6+dmABAA6405D8ROtNpJf6iVhMegQA== caps: [mds] allow * caps: [mon] allow profile mgr caps: [osd] allow * mgr.pxm3 key: AQDp95Nm3ZbLDxAAWc1zLaVOhE0wMxLJCL5IGg== caps: [mds] allow * caps: [mon] allow profile mgr caps: [osd] allow * _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx