Re: osd daemon cluster_fsid not reflecting actual cluster_fsid

Eugen Block <eblock@xxxxxx> · Tue, 18 Jun 2019 10:19:08 +0000

Hi,

this OSD must have been part of a previous cluster, I assume.
I would remove it from crush if it's still there (check just to make  
sure), wipe the disk, remove any traces like logical volumes (if it  
was a ceph-volume lvm OSD) and if possible, reboot the node.

Regards,
Eugen

Zitat von Vincent Pharabot <vincent.pharabot@xxxxxxxxx>:

Hello

I have an OSD which is stuck in booting state.
I find out that the daemon osd cluster_fsid is not the same that the actual
cluster fsid, which should explain why it does not join the cluster

# ceph daemon osd.0 status
{
"cluster_fsid": "bb55e196-eedd-478d-99b6-1aad00b95f2a",
"osd_fsid": "01dbf73f-3866-47be-b623-b9c539dcd955",
"whoami": 0,
"state": "booting",
"oldest_map": 1,
"newest_map": 24,
"num_pgs": 200
}

#ceph fsid
173b6382-504b-421f-aa4d-52526fa80dfa

I checked on the cluster fsid file and it's correct
# cat /var/lib/ceph/osd/ceph-0/ceph_fsid
173b6382-504b-421f-aa4d-52526fa80dfa

OSDMap shows correct fsid also

# ceph osd dump
epoch 33
fsid 173b6382-504b-421f-aa4d-52526fa80dfa
created 2019-06-17 16:42:52.632757
modified 2019-06-18 09:28:10.376573
flags noout,sortbitwise,recovery_deletes,purged_snapdirs
crush_version 13
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
require_min_compat_client jewel
min_compat_client jewel
require_osd_release mimic
pool 1 'cephfs_data' replicated size 3 min_size 1 crush_rule 0 object_hash
rjenkins pg_num 100 pgp_num 100 last_change 17 flags hashpspool
stripe_width 0 application cephfs
pool 2 'cephfs_metadata' replicated size 3 min_size 1 crush_rule 0
object_hash rjenkins pg_num 100 pgp_num 100 last_change 17 flags hashpspool
stripe_width 0 application cephfs
max_osd 3
osd.0 down in weight 1 up_from 0 up_thru 0 down_at 0 last_clean_interval
[0,0) - - - - exists,new 01dbf73f-3866-47be-b623-b9c539dcd955
osd.1 down in weight 1 up_from 0 up_thru 0 down_at 0 last_clean_interval
[0,0) - - - - exists,new ef7c0a4f-5118-4d44-a82b-c9a2cf3c0813
osd.2 down in weight 1 up_from 13 up_thru 23 down_at 26 last_clean_interval
[0,0) 10.8.61.24:6800/4442 10.8.61.24:6801/4442 10.8.61.24:6802/4442
10.8.61.24:6803/4442 exists e40ef3ba-8f19-4b41-be9d-f95f679df0eb

So from where the daemon take the wrong cluster id ?
I might miss something obvious again...

Someone able to help ?

Thank you !
Vincent

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com