Thanks Eugen for answering
Yes it came from another cluster, trying to move all osd from one cluster
to another (1 to 1) so i would avoid wiping the disk
It's indeed a ceph-volume OSD, i checked the lvm label and it's correct
# lvs --noheadings --readonly --separator=";" -o lv_tags
ceph.block_device=/dev/ceph-4681dda6-628d-47db-8981-1762effccf77/osd-block-01dbf73f-3866-47be-b623-b9c539dcd955,ceph.block_
uuid=uL57Kk-9kcO-DdOY-Glwm-cg9P-atmx-3m033v,ceph.cephx_lockbox_secret=,ceph.cluster_fsid=173b6382-504b-421f-aa4d-52526fa80dfa
,ceph.cluster_name=ceph,ceph.crush_device_class=None,ceph.encrypted=0,ceph.osd_fsid=01dbf73f-3866-47be-b623-b9c539dcd955,ceph
.osd_id=0,ceph.type=block,ceph.vdo=0
OSD bluestore labels are also correct
# ceph-bluestore-tool show-label --dev
/dev/ceph-4681dda6-628d-47db-8981-1762effccf77/osd-block-01dbf73f
-3866-47be-b623-b9c539dcd955
{
"/dev/ceph-4681dda6-628d-47db-8981-1762effccf77/osd-block-01dbf73f-3866-47be-b623-b9c539dcd955":
{
"osd_uuid": "01dbf73f-3866-47be-b623-b9c539dcd955",
"size": 1073737629696,
"btime": "2019-06-17 15:28:53.126482",
"description": "main",
"bluefs": "1",
"ceph_fsid": "173b6382-504b-421f-aa4d-52526fa80dfa",
"kv_backend": "rocksdb",
"magic": "ceph osd volume v026",
"mkfs_done": "yes",
"osd_key": "AQBXwwddy5OEAxAAS4AidvOF0kl+kxIBvFhT1A==",
"ready": "ready",
"whoami": "0"
}
}
Anyway to change wrong fsid from OSD without zapping the disk ?
Thank you
Le mar. 18 juin 2019 à 12:19, Eugen Block <eblock@xxxxxx> a écrit :
Hi,
this OSD must have been part of a previous cluster, I assume.
I would remove it from crush if it's still there (check just to make
sure), wipe the disk, remove any traces like logical volumes (if it
was a ceph-volume lvm OSD) and if possible, reboot the node.
Regards,
Eugen
Zitat von Vincent Pharabot <vincent.pharabot@xxxxxxxxx>:
> Hello
>
> I have an OSD which is stuck in booting state.
> I find out that the daemon osd cluster_fsid is not the same that the
actual
> cluster fsid, which should explain why it does not join the cluster
>
> # ceph daemon osd.0 status
> {
> "cluster_fsid": "bb55e196-eedd-478d-99b6-1aad00b95f2a",
> "osd_fsid": "01dbf73f-3866-47be-b623-b9c539dcd955",
> "whoami": 0,
> "state": "booting",
> "oldest_map": 1,
> "newest_map": 24,
> "num_pgs": 200
> }
>
> #ceph fsid
> 173b6382-504b-421f-aa4d-52526fa80dfa
>
> I checked on the cluster fsid file and it's correct
> # cat /var/lib/ceph/osd/ceph-0/ceph_fsid
> 173b6382-504b-421f-aa4d-52526fa80dfa
>
> OSDMap shows correct fsid also
>
> # ceph osd dump
> epoch 33
> fsid 173b6382-504b-421f-aa4d-52526fa80dfa
> created 2019-06-17 16:42:52.632757
> modified 2019-06-18 09:28:10.376573
> flags noout,sortbitwise,recovery_deletes,purged_snapdirs
> crush_version 13
> full_ratio 0.95
> backfillfull_ratio 0.9
> nearfull_ratio 0.85
> require_min_compat_client jewel
> min_compat_client jewel
> require_osd_release mimic
> pool 1 'cephfs_data' replicated size 3 min_size 1 crush_rule 0
object_hash
> rjenkins pg_num 100 pgp_num 100 last_change 17 flags hashpspool
> stripe_width 0 application cephfs
> pool 2 'cephfs_metadata' replicated size 3 min_size 1 crush_rule 0
> object_hash rjenkins pg_num 100 pgp_num 100 last_change 17 flags
hashpspool
> stripe_width 0 application cephfs
> max_osd 3
> osd.0 down in weight 1 up_from 0 up_thru 0 down_at 0 last_clean_interval
> [0,0) - - - - exists,new 01dbf73f-3866-47be-b623-b9c539dcd955
> osd.1 down in weight 1 up_from 0 up_thru 0 down_at 0 last_clean_interval
> [0,0) - - - - exists,new ef7c0a4f-5118-4d44-a82b-c9a2cf3c0813
> osd.2 down in weight 1 up_from 13 up_thru 23 down_at 26
last_clean_interval
> [0,0) 10.8.61.24:6800/4442 10.8.61.24:6801/4442 10.8.61.24:6802/4442
> 10.8.61.24:6803/4442 exists e40ef3ba-8f19-4b41-be9d-f95f679df0eb
>
> So from where the daemon take the wrong cluster id ?
> I might miss something obvious again...
>
> Someone able to help ?
>
> Thank you !
> Vincent
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com