hello,
during basic experimentation I'm running into wierd situaltion when
adding osd to test cluster. The test cluster is created as 3x XEN DomU
Debian Bookworm (test1-3), 4x CPU, 8GB RAM, xvda root, xvbd swap, 4x
xvdj,k,l,m 20GB (LVM volumes in Dom0, propagated via xen phy device) and
cleaned with `wipefs -a`
```
apt-get install cephadm ceph-common
cephadm bootstrap --mon-ip 10.0.0.101
ceph orch host add test2
ceph orch host add test3
```
when adding OSDs the first host gets created OSDs as expected, but
during creating OSDs on second host the output gets wierd, even when
adding each device separately the output shows that `ceph orch` tries to
create multiple osds at once
```
root@test1:~# for xxx in j k l m; do ceph orch daemon add osd
test2:/dev/xvd$xxx; done
Created osd(s) 0,1,2,3 on host 'test2'
Created osd(s) 0,1 on host 'test2'
Created osd(s) 2,3 on host 'test2'
Created osd(s) 1 on host 'test2'
```
the syslog on test2 node shows an errors
```
2023-04-16T20:57:02.528456+00:00 test2 bash[10426]: cephadm
2023-04-16T20:57:01.389951+0000 mgr.test1.ucudzp (mgr.14206) 1691 :
cephadm [INF] Found duplicate OSDs: osd.0 in status running on test1,
osd.0 in status error on test2
2023-04-16T20:57:02.528748+00:00 test2 bash[10426]: cephadm
2023-04-16T20:57:01.391346+0000 mgr.test1.ucudzp (mgr.14206) 1692 :
cephadm [INF] Removing daemon osd.0 from test2 -- ports []
2023-04-16T20:57:02.528943+00:00 test2 bash[10426]: cluster
2023-04-16T20:57:02.350564+0000 mon.test1 (mon.0) 743 : cluster [WRN]
Health check failed: 2 failed cephadm daemon(s) (CEPHADM_FAILED_DAEMON)
2023-04-16T20:57:17.972962+00:00 test2 bash[20098]: stderr: failed to
read label for
/dev/ceph-48f3646c-7070-4a37-b9a4-ed0a4a983965/osd-block-11a0dc2b-f8e1-4694-813f-2309ab6a5c1d:
(2) No such file or directory
2023-04-16T20:57:17.973064+00:00 test2 bash[20098]: stderr:
2023-04-16T20:57:17.962+0000 7fad2451c540 -1
bluestore(/dev/ceph-48f3646c-7070-4a37-b9a4-ed0a4a983965/osd-block-11a0dc2b-f8e1-4694-813f-2309ab6a5c1d)
_read_bdev_label failed to open
/dev/ceph-48f3646c-7070-4a37-b9a4-ed0a4a983965/osd-block-11a0dc2b-f8e1-4694-813f-2309ab6a5c1d:
(2) No such file or directory
2023-04-16T20:57:17.973181+00:00 test2 bash[20098]: --> Failed to
activate via lvm: command returned non-zero exit status: 1
2023-04-16T20:57:17.973278+00:00 test2 bash[20098]: --> Failed to
activate via simple: 'Namespace' object has no attribute 'json_config'
2023-04-16T20:57:17.973368+00:00 test2 bash[20098]: --> Failed to
activate any OSD(s)
```
the ceph and cephadm binaries are installed from debian bookworm
```
ii ceph-common 16.2.11+ds-2 amd64 common utilities to mount
and interact with a ceph storage cluster
ii cephadm 16.2.11+ds-2 amd64 utility to bootstrap ceph
daemons with systemd and containers
```
management session script can be found at https://pastebin.com/raw/FiX7DMHS
none of the googled symptoms helped me to understand why is this
situation happening nor how to troubleshoot or debug the issues. I'd
understand that the nodes are very log on RAM to get this experiment
running, but the behavior does not really look like OOM issue.
any idea would be appreciated
thanks
bodik
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx