Hi All :)
I'm trying to debug some interesting behavior I have encountered where I fail to deploy OSD's via ceph orch daemon add osd command
I deploy my ceph cluster without an OSD specification file. Just a ceph spec file with the cluster hosts listed in it. The bootstrap passes successfully and all hosts are appearing when I execute 'ceph orch host ls' with the correct labels.
===================================================================
HOST ADDR LABELS STATUS
controller-0 172.31.0.170 _admin,mon,mgr,mds,rgw,crash
controller-1 172.31.3.232 mon,mgr,mds,rgw,crash,_admin
controller-2 172.31.1.45 mon,mgr,mds,rgw,crash,_admin
ovscompute-0 172.31.0.28 osd,crash,_admin
ovscompute-1 172.31.0.26 osd,crash,_admin
5 hosts in cluster
===================================================================
In my installation setup I have 2 nodes that are labeled as OSD hosts.
Immediately after bootstrapping and 'apply spec' end successfully, and the cluster is up and running, I have some ansible playbook that is executed that executes the 'ceph orch daemon add osd'
I have verified that the OSD's haven't been deployed on the two hosts.
In the ceph-mgr log I see:
===================================================================
2024-08-13T10:06:47.713+0000 7f482b1c5640 0 log_channel(audit) log [DBG] : from='client.14194 -' entity='client.admin' cmd=[{"prefix": "orch daemon add osd", "svc_arg": "ovscompute-1:data_devices=/dev/vdb,/dev/vdc", "target": ["mon-mgr", ""]}]: dispatch
2024-08-13T10:06:47.714+0000 7f482b1c5640 0 log_channel(audit) log [DBG] : from='client.14191 -' entity='client.admin' cmd=[{"prefix": "orch daemon add osd", "svc_arg": "ovscompute-0:data_devices=/dev/vdb,/dev/vdc", "target": ["mon-mgr", ""]}]: dispatch
2024-08-21T07:11:19.167+0000 7f675e4eb640 0 log_channel(cephadm) log [DBG] : Processing DriveGroup DriveGroupSpec.from_json(yaml.safe_load('''service_type: osd
service_name: osd
placement:
host_pattern: ovscompute-1
spec:
data_devices:
paths:
- /dev/vdb
- /dev/vdc
filter_logic: AND
objectstore: bluestore
'''))
2024-08-21T07:11:19.170+0000 7f675e4eb640 0 log_channel(cephadm) log [DBG] : mon_command: 'osd tree' -> 0 in 0.002s
2024-08-21T07:11:19.171+0000 7f6763535640 0 log_channel(cephadm) log [DBG] : Checking matching hosts -> []
2024-08-21T07:11:19.173+0000 7f675e4eb640 0 log_channel(cephadm) log [DBG] : Processing DriveGroup DriveGroupSpec.from_json(yaml.safe_load('''service_type: osd
service_name: osd
placement:
host_pattern: ovscompute-0
spec:
data_devices:
paths:
- /dev/vdb
- /dev/vdc
filter_logic: AND
objectstore: bluestore
'''))
2024-08-21T07:11:19.180+0000 7f675e4eb640 0 log_channel(cephadm) log [DBG] : mon_command: 'osd tree' -> 0 in 0.007s
2024-08-21T07:11:19.182+0000 7f6763535640 0 log_channel(cephadm) log [DBG] : Checking matching hosts -> []
===================================================================
The interesting part I found here, that might help understand why the command fails, is this print Check matching hosts -> [] (from : src/pybind/mgr/cephadm/services/osd.py)
returns an empty list. When this occurs the OSD host doesn't receive the ceph-volume lvm batch command.
I have another similar setup on which this issue doesn't occur and I see the Check matching hosts actually holds the correct host for the OSD.
I am now debugging this part of the code from (src/pybind/mgr/cephadm/services/osd.py) to try figure out why the matching_hosts is empty list
===================================================================
matching_hosts = drive_group.placement.filter_matching_hostspecs(
self.mgr.cache.get_schedulable_hosts())
===================================================================
Lastly, I noticed that when I run:
===================================================================
ceph -W cephadm --watch-debug
===================================================================
before the OSD deployment part, the issue is not reproduced. This information is probably useless but I noticed its consistent and have no idea why this might impact.
Anyone faced a similar issue?
Thanks
I'm trying to debug some interesting behavior I have encountered where I fail to deploy OSD's via ceph orch daemon add osd command
I deploy my ceph cluster without an OSD specification file. Just a ceph spec file with the cluster hosts listed in it. The bootstrap passes successfully and all hosts are appearing when I execute 'ceph orch host ls' with the correct labels.
===================================================================
HOST ADDR LABELS STATUS
controller-0 172.31.0.170 _admin,mon,mgr,mds,rgw,crash
controller-1 172.31.3.232 mon,mgr,mds,rgw,crash,_admin
controller-2 172.31.1.45 mon,mgr,mds,rgw,crash,_admin
ovscompute-0 172.31.0.28 osd,crash,_admin
ovscompute-1 172.31.0.26 osd,crash,_admin
5 hosts in cluster
===================================================================
In my installation setup I have 2 nodes that are labeled as OSD hosts.
Immediately after bootstrapping and 'apply spec' end successfully, and the cluster is up and running, I have some ansible playbook that is executed that executes the 'ceph orch daemon add osd'
I have verified that the OSD's haven't been deployed on the two hosts.
In the ceph-mgr log I see:
===================================================================
2024-08-13T10:06:47.713+0000 7f482b1c5640 0 log_channel(audit) log [DBG] : from='client.14194 -' entity='client.admin' cmd=[{"prefix": "orch daemon add osd", "svc_arg": "ovscompute-1:data_devices=/dev/vdb,/dev/vdc", "target": ["mon-mgr", ""]}]: dispatch
2024-08-13T10:06:47.714+0000 7f482b1c5640 0 log_channel(audit) log [DBG] : from='client.14191 -' entity='client.admin' cmd=[{"prefix": "orch daemon add osd", "svc_arg": "ovscompute-0:data_devices=/dev/vdb,/dev/vdc", "target": ["mon-mgr", ""]}]: dispatch
2024-08-21T07:11:19.167+0000 7f675e4eb640 0 log_channel(cephadm) log [DBG] : Processing DriveGroup DriveGroupSpec.from_json(yaml.safe_load('''service_type: osd
service_name: osd
placement:
host_pattern: ovscompute-1
spec:
data_devices:
paths:
- /dev/vdb
- /dev/vdc
filter_logic: AND
objectstore: bluestore
'''))
2024-08-21T07:11:19.170+0000 7f675e4eb640 0 log_channel(cephadm) log [DBG] : mon_command: 'osd tree' -> 0 in 0.002s
2024-08-21T07:11:19.171+0000 7f6763535640 0 log_channel(cephadm) log [DBG] : Checking matching hosts -> []
2024-08-21T07:11:19.173+0000 7f675e4eb640 0 log_channel(cephadm) log [DBG] : Processing DriveGroup DriveGroupSpec.from_json(yaml.safe_load('''service_type: osd
service_name: osd
placement:
host_pattern: ovscompute-0
spec:
data_devices:
paths:
- /dev/vdb
- /dev/vdc
filter_logic: AND
objectstore: bluestore
'''))
2024-08-21T07:11:19.180+0000 7f675e4eb640 0 log_channel(cephadm) log [DBG] : mon_command: 'osd tree' -> 0 in 0.007s
2024-08-21T07:11:19.182+0000 7f6763535640 0 log_channel(cephadm) log [DBG] : Checking matching hosts -> []
===================================================================
The interesting part I found here, that might help understand why the command fails, is this print Check matching hosts -> [] (from : src/pybind/mgr/cephadm/services/osd.py)
returns an empty list. When this occurs the OSD host doesn't receive the ceph-volume lvm batch command.
I have another similar setup on which this issue doesn't occur and I see the Check matching hosts actually holds the correct host for the OSD.
I am now debugging this part of the code from (src/pybind/mgr/cephadm/services/osd.py) to try figure out why the matching_hosts is empty list
===================================================================
matching_hosts = drive_group.placement.filter_matching_hostspecs(
self.mgr.cache.get_schedulable_hosts())
===================================================================
Lastly, I noticed that when I run:
===================================================================
ceph -W cephadm --watch-debug
===================================================================
before the OSD deployment part, the issue is not reproduced. This information is probably useless but I noticed its consistent and have no idea why this might impact.
Anyone faced a similar issue?
Thanks
_______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx