unable to deploy ceph -- failed to read label for XXX No such file or directory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



hello,

during basic experimentation I'm running into wierd situaltion when adding osd to test cluster. The test cluster is created as 3x XEN DomU Debian Bookworm (test1-3), 4x CPU, 8GB RAM, xvda root, xvbd swap, 4x xvdj,k,l,m 20GB (LVM volumes in Dom0, propagated via xen phy device) and cleaned with `wipefs -a`

```
apt-get install cephadm ceph-common
cephadm bootstrap --mon-ip 10.0.0.101
ceph orch host add test2
ceph orch host add test3
```

when adding OSDs the first host gets created OSDs as expected, but during creating OSDs on second host the output gets wierd, even when adding each device separately the output shows that `ceph orch` tries to create multiple osds at once

```
root@test1:~# for xxx in j k l m; do ceph orch daemon add osd test2:/dev/xvd$xxx; done
Created osd(s) 0,1,2,3 on host 'test2'
Created osd(s) 0,1 on host 'test2'
Created osd(s) 2,3 on host 'test2'
Created osd(s) 1 on host 'test2'
```

the syslog on test2 node shows an errors


```
2023-04-16T20:57:02.528456+00:00 test2 bash[10426]: cephadm 2023-04-16T20:57:01.389951+0000 mgr.test1.ucudzp (mgr.14206) 1691 : cephadm [INF] Found duplicate OSDs: osd.0 in status running on test1, osd.0 in status error on test2

2023-04-16T20:57:02.528748+00:00 test2 bash[10426]: cephadm
2023-04-16T20:57:01.391346+0000 mgr.test1.ucudzp (mgr.14206) 1692 : cephadm [INF] Removing daemon osd.0 from test2 -- ports []
2023-04-16T20:57:02.528943+00:00 test2 bash[10426]: cluster
2023-04-16T20:57:02.350564+0000 mon.test1 (mon.0) 743 : cluster [WRN] Health check failed: 2 failed cephadm daemon(s) (CEPHADM_FAILED_DAEMON)

2023-04-16T20:57:17.972962+00:00 test2 bash[20098]: stderr: failed to read label for /dev/ceph-48f3646c-7070-4a37-b9a4-ed0a4a983965/osd-block-11a0dc2b-f8e1-4694-813f-2309ab6a5c1d: (2) No such file or directory 2023-04-16T20:57:17.973064+00:00 test2 bash[20098]: stderr: 2023-04-16T20:57:17.962+0000 7fad2451c540 -1 bluestore(/dev/ceph-48f3646c-7070-4a37-b9a4-ed0a4a983965/osd-block-11a0dc2b-f8e1-4694-813f-2309ab6a5c1d) _read_bdev_label failed to open /dev/ceph-48f3646c-7070-4a37-b9a4-ed0a4a983965/osd-block-11a0dc2b-f8e1-4694-813f-2309ab6a5c1d: (2) No such file or directory 2023-04-16T20:57:17.973181+00:00 test2 bash[20098]: --> Failed to activate via lvm: command returned non-zero exit status: 1 2023-04-16T20:57:17.973278+00:00 test2 bash[20098]: --> Failed to activate via simple: 'Namespace' object has no attribute 'json_config' 2023-04-16T20:57:17.973368+00:00 test2 bash[20098]: --> Failed to activate any OSD(s)
```

the ceph and cephadm binaries are installed from debian bookworm

```
ii ceph-common 16.2.11+ds-2 amd64 common utilities to mount and interact with a ceph storage cluster ii cephadm 16.2.11+ds-2 amd64 utility to bootstrap ceph daemons with systemd and containers
```

management session script can be found at https://pastebin.com/raw/FiX7DMHS


none of the googled symptoms helped me to understand why is this situation happening nor how to troubleshoot or debug the issues. I'd understand that the nodes are very log on RAM to get this experiment running, but the behavior does not really look like OOM issue.

any idea would be appreciated

thanks
bodik
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux