Re: osd daemon cluster_fsid not reflecting actual cluster_fsid

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




I think i found where the wrong fsid is located on OSD osdmap but no way to change fsid...
I tried with ceph-objectstore-tool --op set-osdmap from osdmap from monitor (ceph osd getmap) but no luck..... still with old fsid (no find a way to set the current epoch on osdmap)

Someone to give a hint ?

My goal is to be able to duplicate a ceph cluster (with data) to make some tests... i would avoid taking the same fsid

Thanks !

# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --op get-osdmap --file /tmp/osdmapfromosd3

# osdmaptool /tmp/osdmapfromosd3 --print
osdmaptool: osdmap file '/tmp/osdmapfromosd3'
epoch 24
fsid bb55e196-eedd-478d-99b6-1aad00b95f2a
created 2019-06-17 15:27:44.102409
modified 2019-06-17 15:53:37.279770
flags sortbitwise,recovery_deletes,purged_snapdirs
crush_version 9
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
require_min_compat_client jewel
min_compat_client jewel
require_osd_release mimic

pool 1 'cephfs_data' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 100 pgp_num 100 last_change 17 flags hashpspool stripe_width 0 application cephfs
pool 2 'cephfs_metadata' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 100 pgp_num 100 last_change 17 flags hashpspool stripe_width 0 application cep
hfs

max_osd 3
osd.0 up in weight 1 up_from 23 up_thru 23 down_at 20 last_clean_interval [5,19) 10.8.12.170:6800/3613 10.8.12.170:6801/3613 10.8.12.170:6802/3613 10.8.12.170:6803/3613 e
xists,up 01dbf73f-3866-47be-b623-b9c539dcd955
osd.1 up in weight 1 up_from 9 up_thru 23 down_at 0 last_clean_interval [0,0) 10.8.29.71:6800/4364 10.8.29.71:6801/4364 10.8.29.71:6802/4364 10.8.29.71:6803/4364 exists,u
p ef7c0a4f-5118-4d44-a82b-c9a2cf3c0813
osd.2 up in weight 1 up_from 13 up_thru 23 down_at 0 last_clean_interval [0,0) 10.8.32.182:6800/4361 10.8.32.182:6801/4361 10.8.32.182:6802/4361 10.8.32.182:6803/4361 exi
sts,up 905d17fc-6f37-4404-bd5d-4adc231c49b3


Le mar. 18 juin 2019 à 12:38, Vincent Pharabot <vincent.pharabot@xxxxxxxxx> a écrit :
Thanks Eugen for answering

Yes it came from another cluster, trying to move all osd from one cluster to another (1 to 1) so i would avoid wiping the disk
It's indeed a ceph-volume OSD, i checked the lvm label and it's correct

# lvs --noheadings --readonly --separator=";" -o lv_tags
ceph.block_device=/dev/ceph-4681dda6-628d-47db-8981-1762effccf77/osd-block-01dbf73f-3866-47be-b623-b9c539dcd955,ceph.block_
uuid=uL57Kk-9kcO-DdOY-Glwm-cg9P-atmx-3m033v,ceph.cephx_lockbox_secret=,ceph.cluster_fsid=173b6382-504b-421f-aa4d-52526fa80dfa
,ceph.cluster_name=ceph,ceph.crush_device_class=None,ceph.encrypted=0,ceph.osd_fsid=01dbf73f-3866-47be-b623-b9c539dcd955,ceph
.osd_id=0,ceph.type=block,ceph.vdo=0

OSD bluestore labels are also correct

# ceph-bluestore-tool show-label --dev /dev/ceph-4681dda6-628d-47db-8981-1762effccf77/osd-block-01dbf73f
-3866-47be-b623-b9c539dcd955
{
"/dev/ceph-4681dda6-628d-47db-8981-1762effccf77/osd-block-01dbf73f-3866-47be-b623-b9c539dcd955": {
"osd_uuid": "01dbf73f-3866-47be-b623-b9c539dcd955",
"size": 1073737629696,
"btime": "2019-06-17 15:28:53.126482",
"description": "main",
"bluefs": "1",
"ceph_fsid": "173b6382-504b-421f-aa4d-52526fa80dfa",
"kv_backend": "rocksdb",
"magic": "ceph osd volume v026",
"mkfs_done": "yes",
"osd_key": "AQBXwwddy5OEAxAAS4AidvOF0kl+kxIBvFhT1A==",
"ready": "ready",
"whoami": "0"
}
}


Anyway to change wrong fsid from OSD without zapping the disk ?

Thank you




Le mar. 18 juin 2019 à 12:19, Eugen Block <eblock@xxxxxx> a écrit :
Hi,

this OSD must have been part of a previous cluster, I assume.
I would remove it from crush if it's still there (check just to make 
sure), wipe the disk, remove any traces like logical volumes (if it 
was a ceph-volume lvm OSD) and if possible, reboot the node.

Regards,
Eugen


Zitat von Vincent Pharabot <vincent.pharabot@xxxxxxxxx>:

> Hello
>
> I have an OSD which is stuck in booting state.
> I find out that the daemon osd cluster_fsid is not the same that the actual
> cluster fsid, which should explain why it does not join the cluster
>
> # ceph daemon osd.0 status
> {
> "cluster_fsid": "bb55e196-eedd-478d-99b6-1aad00b95f2a",
> "osd_fsid": "01dbf73f-3866-47be-b623-b9c539dcd955",
> "whoami": 0,
> "state": "booting",
> "oldest_map": 1,
> "newest_map": 24,
> "num_pgs": 200
> }
>
> #ceph fsid
> 173b6382-504b-421f-aa4d-52526fa80dfa
>
> I checked on the cluster fsid file and it's correct
> # cat /var/lib/ceph/osd/ceph-0/ceph_fsid
> 173b6382-504b-421f-aa4d-52526fa80dfa
>
> OSDMap shows correct fsid also
>
> # ceph osd dump
> epoch 33
> fsid 173b6382-504b-421f-aa4d-52526fa80dfa
> created 2019-06-17 16:42:52.632757
> modified 2019-06-18 09:28:10.376573
> flags noout,sortbitwise,recovery_deletes,purged_snapdirs
> crush_version 13
> full_ratio 0.95
> backfillfull_ratio 0.9
> nearfull_ratio 0.85
> require_min_compat_client jewel
> min_compat_client jewel
> require_osd_release mimic
> pool 1 'cephfs_data' replicated size 3 min_size 1 crush_rule 0 object_hash
> rjenkins pg_num 100 pgp_num 100 last_change 17 flags hashpspool
> stripe_width 0 application cephfs
> pool 2 'cephfs_metadata' replicated size 3 min_size 1 crush_rule 0
> object_hash rjenkins pg_num 100 pgp_num 100 last_change 17 flags hashpspool
> stripe_width 0 application cephfs
> max_osd 3
> osd.0 down in weight 1 up_from 0 up_thru 0 down_at 0 last_clean_interval
> [0,0) - - - - exists,new 01dbf73f-3866-47be-b623-b9c539dcd955
> osd.1 down in weight 1 up_from 0 up_thru 0 down_at 0 last_clean_interval
> [0,0) - - - - exists,new ef7c0a4f-5118-4d44-a82b-c9a2cf3c0813
> osd.2 down in weight 1 up_from 13 up_thru 23 down_at 26 last_clean_interval
> [0,0) 10.8.61.24:6800/4442 10.8.61.24:6801/4442 10.8.61.24:6802/4442
> 10.8.61.24:6803/4442 exists e40ef3ba-8f19-4b41-be9d-f95f679df0eb
>
> So from where the daemon take the wrong cluster id ?
> I might miss something obvious again...
>
> Someone able to help ?
>
> Thank you !
> Vincent



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux