Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The server has enough available storage:

   [root@mostha1 log]# df -h
   Sys. de fichiers          Taille Utilisé Dispo Uti% Monté sur
   devtmpfs                     24G       0   24G   0% /dev
   tmpfs                        24G     84K   24G   1% /dev/shm
   tmpfs                        24G    195M   24G   1% /run
   tmpfs                        24G       0   24G   0% /sys/fs/cgroup
   /dev/mapper/al8vg-rootvol    49G    6,5G   43G  14% /
   /dev/sda1                   3,8G    412M  3,2G  12% /boot
   /dev/mapper/al8vg-varvol     20G    9,7G   11G  49% /var
   /dev/mapper/al8vg-tmpvol    9,8G    103M  9,7G   2% /tmp
   /dev/mapper/al8vg-homevol   9,8G    103M  9,7G   2% /home
   tmpfs                       4,7G       0  4,7G   0% /run/user/0
   overlay                      20G    9,7G   11G  49%
   /var/lib/containers/storage/overlay/b8769720357497ebdbf68768753da154b3d63cfbef254036441af60a91649127/merged
   overlay                      20G    9,7G   11G  49%
   /var/lib/containers/storage/overlay/2eed15daec130da50530621740025655ecd961e1b1855f35922f03561960d999/merged
   overlay                      20G    9,7G   11G  49%
   /var/lib/containers/storage/overlay/4d0b4f0b4063cce3f983beda844440bac78dd3b5f30379d2eb96daefef8ddfaf/merged
   overlay                      20G    9,7G   11G  49%
   /var/lib/containers/storage/overlay/129c5d3e070f80f17a79c1f172b60c2fc0f30a84b51b07ea207dc5868cd1d7f0/merged
   overlay                      20G    9,7G   11G  49%
   /var/lib/containers/storage/overlay/c41d6bdaf941d16fd80326ef5dae6a02524d3f41bcb64cb29bda2bd5816fee9a/merged
   overlay                      20G    9,7G   11G  49%
   /var/lib/containers/storage/overlay/1b6c1c893e7ed2c128378bdf2af408f3a834f3453a0505ac042099d6f484dc9b/merged
   overlay                      20G    9,7G   11G  49%
   /var/lib/containers/storage/overlay/962e5c1380a60e9a54ac29eccb71667f13a5f9047b2ee98e6303a5fea613162f/merged
   overlay                      20G    9,7G   11G  49%
   /var/lib/containers/storage/overlay/3578d0f5a70afce839017dec888908dead82fb50f90834e5b040e9fd2ada9fba/merged
   overlay                      20G    9,7G   11G  49%
   /var/lib/containers/storage/overlay/7d9c35751388325c3da54f03981770aa49599a657c2dfe3ba9527884864f177d/merged


When I was testing different versions, I removed tested images each time with "podman rmi"

   for i in v16.2.10-20220920 v16.2.11-20230125 v16.2.11-20230209
   v16.2.11-20230316; do
   echo "=================== Ceph $i ======================="
   cephadm --image quay.io/ceph/ceph:$i ceph-volume inventory
   id=$(podman images |grep " $i "|cut -c 46-59)
   podman rmi $id
   done |tee trace.ceph16.2.txt

I do not now how to investigate, may be with a "git bisect" between the 2 releases to catch the faulty commit in a podman container context. I'm not so familiar with containers and ceph.

Patrick

Le 13/10/2023 à 09:18, Eugen Block a écrit :
Trying to resend with the attachment.
I can't really find anything suspicious, ceph-volume (16.2.11) does recognize /dev/sdc though:

[2023-10-12 08:58:14,135][ceph_volume.process][INFO  ] stdout NAME="sdc" KNAME="sdc" PKNAME="" MAJ:MIN="8:32" FSTYPE="" MOUNTPOINT="" LABEL="" UUID="" RO="0" RM="1" MODEL="SAMSUNG HE253GJ " SIZE="232.9G" STATE="running" OWNER="root" GROUP="disk" MODE="brw-rw----" ALIGNMENT="0" PHY-SEC="512" LOG-SEC="512" ROTA="1" SCHED="mq-deadline" TYPE="disk" DISC-ALN="0" DISC-GRAN="0B" DISC-MAX="0B" DISC-ZERO="0" PKNAME="" PARTLABEL="" [2023-10-12 08:58:14,139][ceph_volume.util.system][INFO  ] Executable pvs found on the host, will use /sbin/pvs [2023-10-12 08:58:14,140][ceph_volume.process][INFO  ] Running command: nsenter --mount=/rootfs/proc/1/ns/mnt --ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net --uts=/rootfs/proc/1/ns/uts /sbin/pvs --noheadings --readonly --units=b --nosuffix --separator=";" -o pv_name,vg_name,pv_count,lv_count,vg_attr,vg_extent_count,vg_free_count,vg_extent_size

But apparently it just stops after that. I already tried to find a debug log-level for ceph-volume but it's not applicable to all subcommands. The cephadm.log also just stops without even finishing the "copying blob", which makes me wonder if it actually pulls the entire image? I assume you have enough free disk space (otherwise I would expect a message "failed to pull target image"), do you see any other warnings in syslog or something? Or are the logs incomplete?
Maybe someone else finds any clues in the logs...

Regards,
Eugen

Zitat von Patrick Begou <Patrick.Begou@xxxxxxxxxxxxxxxxxxxxxx>:

Hi Eugen,

You will find in attachment cephadm.log and cepĥ-volume.log. Each contains the outputs for the 2 versions.  v16.2.10-20220920 is really more verbose or v16.2.11-20230125 does not execute all the detection process

Patrick


Le 12/10/2023 à 09:34, Eugen Block a écrit :
Good catch, and I found the thread I had in my mind, it was this exact one. :-D Anyway, can you share the ceph-volume.log from the working and the not working attempt? I tried to look for something significant in the pacific release notes for 16.2.11, and there were some changes to ceph-volume, but I'm not sure what it could be.

Zitat von Patrick Begou <Patrick.Begou@xxxxxxxxxxxxxxxxxxxxxx>:

I've ran additional tests with Pacific releases and with "ceph-volume inventory" things went wrong with the first v16.11 release (v16.2.11-20230125)

=================== Ceph v16.2.10-20220920 =======================

Device Path               Size         rotates available Model name
/dev/sdc                  232.83 GB    True    True SAMSUNG HE253GJ
/dev/sda                  232.83 GB    True    False SAMSUNG HE253GJ
/dev/sdb                  465.76 GB    True    False     WDC WD5003ABYX-1

=================== Ceph v16.2.11-20230125 =======================

Device Path               Size         Device nodes rotates available Model name


May be this could help to see what has changed ?

Patrick

Le 11/10/2023 à 17:38, Eugen Block a écrit :
That's really strange. Just out of curiosity, have you tried Quincy (and/or Reef) as well? I don't recall what inventory does in the background exactly, I believe Adam King mentioned that in some thread, maybe that can help here. I'll search for that thread tomorrow.

Zitat von Patrick Begou <Patrick.Begou@xxxxxxxxxxxxxxxxxxxxxx>:

Hi Eugen,

[root@mostha1 ~]# rpm -q cephadm
cephadm-16.2.14-0.el8.noarch

Log associated to the

2023-10-11 16:16:02,167 7f820515fb80 DEBUG --------------------------------------------------------------------------------
cephadm ['gather-facts']
2023-10-11 16:16:02,208 7f820515fb80 DEBUG /bin/podman: 4.4.1
2023-10-11 16:16:02,313 7f820515fb80 DEBUG sestatus: SELinux status:                 disabled 2023-10-11 16:16:02,317 7f820515fb80 DEBUG sestatus: SELinux status:                 disabled 2023-10-11 16:16:02,322 7f820515fb80 DEBUG sestatus: SELinux status:                 disabled 2023-10-11 16:16:02,326 7f820515fb80 DEBUG sestatus: SELinux status:                 disabled 2023-10-11 16:16:02,329 7f820515fb80 DEBUG sestatus: SELinux status:                 disabled 2023-10-11 16:16:02,333 7f820515fb80 DEBUG sestatus: SELinux status:                 disabled 2023-10-11 16:16:04,474 7ff2a5c08b80 DEBUG --------------------------------------------------------------------------------
cephadm ['ceph-volume', 'inventory']
2023-10-11 16:16:04,516 7ff2a5c08b80 DEBUG /usr/bin/podman: 4.4.1
2023-10-11 16:16:04,520 7ff2a5c08b80 DEBUG Using default config: /etc/ceph/ceph.conf 2023-10-11 16:16:04,573 7ff2a5c08b80 DEBUG /usr/bin/podman: 0d28d71358d7,445.8MB / 50.32GB 2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 2084faaf4d54,13.27MB / 50.32GB 2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 61073c53805d,512.7MB / 50.32GB 2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 6b9f0b72d668,361.1MB / 50.32GB 2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 7493a28808ad,163.7MB / 50.32GB 2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: a89672a3accf,59.22MB / 50.32GB 2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: b45271cc9726,54.24MB / 50.32GB 2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: e00ec13ab138,707.3MB / 50.32GB 2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: fcb1e1a6b08d,35.55MB / 50.32GB 2023-10-11 16:16:04,630 7ff2a5c08b80 DEBUG /usr/bin/podman: 0d28d71358d7,1.28% 2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 2084faaf4d54,0.00% 2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 61073c53805d,1.19% 2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 6b9f0b72d668,1.03% 2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 7493a28808ad,0.78% 2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: a89672a3accf,0.11% 2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: b45271cc9726,1.35% 2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: e00ec13ab138,0.43% 2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: fcb1e1a6b08d,0.02% 2023-10-11 16:16:04,634 7ff2a5c08b80 INFO Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c 2023-10-11 16:16:04,691 7ff2a5c08b80 DEBUG /usr/bin/podman: quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e 2023-10-11 16:16:04,692 7ff2a5c08b80 DEBUG /usr/bin/podman: quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca 2023-10-11 16:16:04,692 7ff2a5c08b80 DEBUG /usr/bin/podman: docker.io/ceph/ceph@sha256:056637972a107df4096f10951e4216b21fcd8ae0b9fb4552e628d35df3f61139 2023-10-11 16:16:04,694 7ff2a5c08b80 INFO Using recent ceph image quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e
2023-10-11 16:16:05,094 7ff2a5c08b80 DEBUG stat: 167 167
2023-10-11 16:16:05,903 7ff2a5c08b80 DEBUG Acquiring lock 140679815723776 on /run/cephadm/250f9864-0142-11ee-8e5f-00266cf8869c.lock 2023-10-11 16:16:05,903 7ff2a5c08b80 DEBUG Lock 140679815723776 acquired on /run/cephadm/250f9864-0142-11ee-8e5f-00266cf8869c.lock 2023-10-11 16:16:05,929 7ff2a5c08b80 DEBUG sestatus: SELinux status:                 disabled 2023-10-11 16:16:05,933 7ff2a5c08b80 DEBUG sestatus: SELinux status:                 disabled
2023-10-11 16:16:06,700 7ff2a5c08b80 DEBUG /usr/bin/podman:
2023-10-11 16:16:06,701 7ff2a5c08b80 DEBUG /usr/bin/podman: Device Path               Size Device nodes rotates available Model name


I have only one version of cephadm in /var/lib/ceph/{fsid} :
[root@mostha1 ~]# ls -lrt /var/lib/ceph/250f9864-0142-11ee-8e5f-00266cf8869c/cephadm* -rw-r--r-- 1 root root 350889 28 sept. 16:39 /var/lib/ceph/250f9864-0142-11ee-8e5f-00266cf8869c/cephadm.f6868821c084cd9740b59c7c5eb59f0dd47f6e3b1e6fecb542cb44134ace8d78


Running " python3 /var/lib/ceph/250f9864-0142-11ee-8e5f-00266cf8869c/cephadm.f6868821c084cd9740b59c7c5eb59f0dd47f6e3b1e6fecb542cb44134ace8d78 ceph-volume inventory" give the same output and the same log (execpt the valu of the lock):

2023-10-11 16:21:35,965 7f467cf31b80 DEBUG --------------------------------------------------------------------------------
cephadm ['ceph-volume', 'inventory']
2023-10-11 16:21:36,009 7f467cf31b80 DEBUG /usr/bin/podman: 4.4.1
2023-10-11 16:21:36,012 7f467cf31b80 DEBUG Using default config: /etc/ceph/ceph.conf 2023-10-11 16:21:36,067 7f467cf31b80 DEBUG /usr/bin/podman: 0d28d71358d7,452.1MB / 50.32GB 2023-10-11 16:21:36,067 7f467cf31b80 DEBUG /usr/bin/podman: 2084faaf4d54,13.27MB / 50.32GB 2023-10-11 16:21:36,067 7f467cf31b80 DEBUG /usr/bin/podman: 61073c53805d,513.6MB / 50.32GB 2023-10-11 16:21:36,067 7f467cf31b80 DEBUG /usr/bin/podman: 6b9f0b72d668,322.4MB / 50.32GB 2023-10-11 16:21:36,067 7f467cf31b80 DEBUG /usr/bin/podman: 7493a28808ad,164MB / 50.32GB 2023-10-11 16:21:36,067 7f467cf31b80 DEBUG /usr/bin/podman: a89672a3accf,58.5MB / 50.32GB 2023-10-11 16:21:36,067 7f467cf31b80 DEBUG /usr/bin/podman: b45271cc9726,54.69MB / 50.32GB 2023-10-11 16:21:36,067 7f467cf31b80 DEBUG /usr/bin/podman: e00ec13ab138,707.1MB / 50.32GB 2023-10-11 16:21:36,068 7f467cf31b80 DEBUG /usr/bin/podman: fcb1e1a6b08d,36.28MB / 50.32GB 2023-10-11 16:21:36,125 7f467cf31b80 DEBUG /usr/bin/podman: 0d28d71358d7,1.27% 2023-10-11 16:21:36,125 7f467cf31b80 DEBUG /usr/bin/podman: 2084faaf4d54,0.00% 2023-10-11 16:21:36,125 7f467cf31b80 DEBUG /usr/bin/podman: 61073c53805d,1.16% 2023-10-11 16:21:36,125 7f467cf31b80 DEBUG /usr/bin/podman: 6b9f0b72d668,1.02% 2023-10-11 16:21:36,125 7f467cf31b80 DEBUG /usr/bin/podman: 7493a28808ad,0.78% 2023-10-11 16:21:36,125 7f467cf31b80 DEBUG /usr/bin/podman: a89672a3accf,0.11% 2023-10-11 16:21:36,125 7f467cf31b80 DEBUG /usr/bin/podman: b45271cc9726,1.35% 2023-10-11 16:21:36,125 7f467cf31b80 DEBUG /usr/bin/podman: e00ec13ab138,0.41% 2023-10-11 16:21:36,125 7f467cf31b80 DEBUG /usr/bin/podman: fcb1e1a6b08d,0.02% 2023-10-11 16:21:36,128 7f467cf31b80 INFO Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c 2023-10-11 16:21:36,186 7f467cf31b80 DEBUG /usr/bin/podman: quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e 2023-10-11 16:21:36,187 7f467cf31b80 DEBUG /usr/bin/podman: quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca 2023-10-11 16:21:36,187 7f467cf31b80 DEBUG /usr/bin/podman: docker.io/ceph/ceph@sha256:056637972a107df4096f10951e4216b21fcd8ae0b9fb4552e628d35df3f61139 2023-10-11 16:21:36,189 7f467cf31b80 INFO Using recent ceph image quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e
2023-10-11 16:21:36,549 7f467cf31b80 DEBUG stat: 167 167
2023-10-11 16:21:36,942 7f467cf31b80 DEBUG Acquiring lock 139940396923424 on /run/cephadm/250f9864-0142-11ee-8e5f-00266cf8869c.lock 2023-10-11 16:21:36,942 7f467cf31b80 DEBUG Lock 139940396923424 acquired on /run/cephadm/250f9864-0142-11ee-8e5f-00266cf8869c.lock 2023-10-11 16:21:36,969 7f467cf31b80 DEBUG sestatus: SELinux status:                 disabled 2023-10-11 16:21:36,972 7f467cf31b80 DEBUG sestatus: SELinux status:                 disabled
2023-10-11 16:21:37,749 7f467cf31b80 DEBUG /usr/bin/podman:
2023-10-11 16:21:37,750 7f467cf31b80 DEBUG /usr/bin/podman: Device Path               Size Device nodes rotates available Model name

Patrick

Le 11/10/2023 à 15:59, Eugen Block a écrit :
Can you check which cephadm version is installed on the host? And then please add (only the relevant) output from the cephadm.log when you run the inventory (without the --image <octopus>). Sometimes the version mismatch on the host and the one the orchestrator uses can cause some disruptions. You could try the same with the latest cephadm you have in /var/lib/ceph/${fsid}/ (ls -lrt /var/lib/ceph/${fsid}/cephadm.*). I mentioned that in this thread [1]. So you could try the following:

$ chmod +x /var/lib/ceph/{fsid}/cephadm.{latest}

$ python3 /var/lib/ceph/{fsid}/cephadm.{latest} ceph-volume inventory

Does the output differ? Paste the relevant cephadm.log from that attempt as well.

[1] https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/LASBJCSPFGDYAWPVE2YLV2ZLF3HC5SLS/

Zitat von Patrick Begou <Patrick.Begou@xxxxxxxxxxxxxxxxxxxxxx>:

Hi Eugen,

first many thanks for the time spent on this problem.

"ceph osd purge 2 --force --yes-i-really-mean-it" works and clean all the bas status.

*[root@mostha1 ~]# cephadm shell
*Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c
Using recent ceph image quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e
*
*
*[ceph: root@mostha1 /]# ceph osd purge 2 --force --yes-i-really-mean-it *
purged osd.2
*
*
*[ceph: root@mostha1 /]# ceph osd tree*
ID  CLASS  WEIGHT   TYPE NAME         STATUS REWEIGHT PRI-AFF
-1         1.72823  root default
-5         0.45477      host dean
 0    hdd  0.22739          osd.0         up 1.00000 1.00000
 4    hdd  0.22739          osd.4         up 1.00000 1.00000
-9         0.22739      host ekman
 6    hdd  0.22739          osd.6         up 1.00000 1.00000
-7         0.45479      host mostha1
 5    hdd  0.45479          osd.5         up 1.00000 1.00000
-3         0.59128      host mostha2
 1    hdd  0.22739          osd.1         up 1.00000 1.00000
 3    hdd  0.36389          osd.3         up 1.00000 1.00000
*
*
*[ceph: root@mostha1 /]# lsblk*
NAME MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda 8:0    1 232.9G  0 disk
|-sda1 8:1    1   3.9G  0 part /rootfs/boot
|-sda2 8:2    1   3.9G  0 part [SWAP]
`-sda3 8:3    1   225G  0 part
|-al8vg-rootvol 253:0    0  48.8G  0 lvm  /rootfs
|-al8vg-homevol 253:2    0   9.8G  0 lvm /rootfs/home
|-al8vg-tmpvol 253:3    0   9.8G  0 lvm  /rootfs/tmp
`-al8vg-varvol 253:4    0  19.8G  0 lvm  /rootfs/var
sdb 8:16   1 465.8G  0 disk
`-ceph--08827fdc--136e--4070--97e9--e5e8b3970766-osd--block--7dec1808--d6f4--4f90--ac74--75a4346e1df5 253:1    0 465.8G  0 lvm
sdc 8:32   1 232.9G  0 disk

"cephadm ceph-volume inventory" returns nothing:

*[root@mostha1 ~]# cephadm ceph-volume inventory **
*Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c
Using recent ceph image quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e

Device Path               Size         Device nodes rotates available Model name

[root@mostha1 ~]#

But running the same command within cephadm 15.2.17 works:

*[root@mostha1 ~]# cephadm --image 93146564743f ceph-volume inventory*
Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c

Device Path               Size         rotates available Model name /dev/sdc                  232.83 GB    True    True SAMSUNG HE253GJ /dev/sda                  232.83 GB    True    False SAMSUNG HE253GJ /dev/sdb                  465.76 GB    True    False WDC WD5003ABYX-1

[root@mostha1 ~]#

*[root@mostha1 ~]# podman images -a**
*REPOSITORY                        TAG         IMAGE ID CREATED        SIZE quay.io/ceph/ceph                 v16.2.14 f13d80acdbb5 2 weeks ago    1.21 GB quay.io/ceph/ceph                 v15.2.17 93146564743f 14 months ago  1.24 GB
....


Patrick

Le 11/10/2023 à 15:14, Eugen Block a écrit :
Your response is a bit confusing since it seems to be mixed up with the previous answer. So you still need to remove the OSD properly, so purge it from the crush tree:

ceph osd purge 2 --force --yes-i-really-mean-it (only in a test cluster!)

If everything is clean (OSD has been removed, disk has been zapped, lsblk shows no LVs for that disk) you can check the inventory:

cephadm ceph-volume inventory

Please also add the output of 'ceph orch ls osd --export'.

Zitat von Patrick Begou <Patrick.Begou@xxxxxxxxxxxxxxxxxxxxxx>:

Hi Eugen,

- the OS is Alma Linux 8 with latests updates.

- this morning I've worked with ceph-volume but it ends with a strange final state. I was connected on host mostha1 where /dev/sdc was not reconized. These are the steps followed based on the ceph-volume documentation I've read:
[root@mostha1 ~]# cephadm shell
[ceph: root@mostha1 /]# ceph auth get client.bootstrap-osd > /var/lib/ceph/bootstrap-osd/ceph.keyring [ceph: root@mostha1 /]# ceph-volume lvm prepare --bluestore --data /dev/sdc

Now lsblk command shows sdc as an osd:
....
sdb 8:16   1 465.8G  0 disk
`-ceph--08827fdc--136e--4070--97e9--e5e8b3970766-osd--block--7dec1808--d6f4--4f90--ac74--75a4346e1df5 253:1    0 465.8G  0 lvm
sdc 8:32   1 232.9G  0 disk
`-ceph--b27d7a07--278d--4ee2--b84e--53256ef8de4c-osd--block--45c8e92c--caf9--4fe7--9a42--7b45a0794632 253:5    0 232.8G  0 lvm

Then I've tried to activate this osd but it fails as in podman I have not access to systemctl:

[ceph: root@mostha1 /]# ceph-volume lvm activate 2 45c8e92c-caf9-4fe7-9a42-7b45a0794632
.....
Running command: /usr/bin/systemctl start ceph-osd@2
 stderr: Failed to connect to bus: No such file or directory
-->  RuntimeError: command returned non-zero exit status: 1
[ceph: root@mostha1 /]# ceph osd tree

And now I have now I have a strange status for this osd.2:

[ceph: root@mostha1 /]# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME         STATUS REWEIGHT PRI-AFF
-1         1.72823  root default
-5         0.45477      host dean
 0    hdd  0.22739          osd.0         up 1.00000 1.00000
 4    hdd  0.22739          osd.4         up 1.00000 1.00000
-9         0.22739      host ekman
 6    hdd  0.22739          osd.6         up 1.00000 1.00000
-7         0.45479      host mostha1
 5    hdd  0.45479          osd.5         up 1.00000 1.00000
-3         0.59128      host mostha2
 1    hdd  0.22739          osd.1         up 1.00000 1.00000
 3    hdd  0.36389          osd.3         up 1.00000 1.00000
 2               0  osd.2               down 0 1.00000

I've tried to destroy the osd as you suggest but even if the command returns no error I still have this osd even if "lsblk" do not show any more /dev/sdc as a ceph osd device.

*[ceph: root@mostha1 /]# ceph-volume lvm zap --destroy /dev/sdc**
*--> Zapping: /dev/sdc
--> Zapping lvm member /dev/sdc. lv_path is /dev/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c/osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632
--> Unmounting /var/lib/ceph/osd/ceph-2
Running command: /usr/bin/umount -v /var/lib/ceph/osd/ceph-2
 stderr: umount: /var/lib/ceph/osd/ceph-2 unmounted
Running command: /usr/bin/dd if=/dev/zero of=/dev/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c/osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632 bs=1M count=10 conv=fsync
 stderr: 10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.575633 s, 18.2 MB/s
--> Only 1 LV left in VG, will proceed to destroy volume group ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c Running command: nsenter --mount=/rootfs/proc/1/ns/mnt --ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net --uts=/rootfs/proc/1/ns/uts /sbin/vgremove -v -f ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c  stderr: Removing ceph--b27d7a07--278d--4ee2--b84e--53256ef8de4c-osd--block--45c8e92c--caf9--4fe7--9a42--7b45a0794632 (253:1)  stderr: Releasing logical volume "osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632"  stderr: Archiving volume group "ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c" metadata (seqno 5).  stdout: Logical volume "osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632" successfully removed.  stderr: Removing physical volume "/dev/sdc" from volume group "ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c"  stdout: Volume group "ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c" successfully removed  stderr: Creating volume group backup "/etc/lvm/backup/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c" (seqno 6). Running command: nsenter --mount=/rootfs/proc/1/ns/mnt --ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net --uts=/rootfs/proc/1/ns/uts /sbin/pvremove -v -f -f /dev/sdc  stdout: Labels on physical volume "/dev/sdc" successfully wiped. Running command: /usr/bin/dd if=/dev/zero of=/dev/sdc bs=1M count=10 conv=fsync
 stderr: 10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.590652 s, 17.8 MB/s
*--> Zapping successful for: <Raw Device: /dev/sdc>*
*
*
*[ceph: root@mostha1 /]# ceph osd tree**
*ID  CLASS  WEIGHT   TYPE NAME         STATUS REWEIGHT PRI-AFF
-1         1.72823  root default
-5         0.45477      host dean
 0    hdd  0.22739          osd.0         up 1.00000 1.00000
 4    hdd  0.22739          osd.4         up 1.00000 1.00000
-9         0.22739      host ekman
 6    hdd  0.22739          osd.6         up 1.00000 1.00000
-7         0.45479      host mostha1
 5    hdd  0.45479          osd.5         up 1.00000 1.00000
-3         0.59128      host mostha2
 1    hdd  0.22739          osd.1         up 1.00000 1.00000
 3    hdd  0.36389          osd.3         up 1.00000 1.00000
 2               0  osd.2               down 0 1.00000
*
*
*[ceph: root@mostha1 /]# lsblk**
*NAME MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda 8:0    1 232.9G  0 disk
|-sda1 8:1    1   3.9G  0 part /rootfs/boot
|-sda2 8:2    1   3.9G  0 part [SWAP]
`-sda3 8:3    1   225G  0 part
|-al8vg-rootvol 253:0    0  48.8G  0 lvm /rootfs
|-al8vg-homevol 253:3    0   9.8G  0 lvm /rootfs/home
|-al8vg-tmpvol 253:4    0   9.8G  0 lvm /rootfs/tmp
`-al8vg-varvol 253:5    0  19.8G  0 lvm /rootfs/var
sdb 8:16   1 465.8G  0 disk
`-ceph--08827fdc--136e--4070--97e9--e5e8b3970766-osd--block--7dec1808--d6f4--4f90--ac74--75a4346e1df5 253:2    0 465.8G  0 lvm
*sdc *

Patrick
Le 11/10/2023 à 11:00, Eugen Block a écrit :
Hi,

just wondering if 'ceph-volume lvm zap --destroy /dev/sdc' would help here. From your previous output you didn't specify the --destroy flag. Which cephadm version is installed on the host? Did you also upgrade the OS when moving to Pacific? (Sorry if I missed that.


Zitat von Patrick Begou <Patrick.Begou@xxxxxxxxxxxxxxxxxxxxxx>:

Le 02/10/2023 à 18:22, Patrick Bégou a écrit :
Hi all,

still stuck with this problem.

I've deployed octopus and all my HDD have been setup as osd. Fine. I've upgraded to pacific and 2 osd have failed. They have been automatically removed and upgrade finishes. Cluster Health is finaly OK, no data loss.

But now I cannot re-add these osd with pacific (I had previous troubles on these old HDDs, lost one osd in octopus and was able to reset and re-add it).

I've tried manually to add the first osd on the node where it is located, following https://docs.ceph.com/en/pacific/rados/operations/bluestore-migration/ (not sure it's the best idea...) but it fails too. This node was the one used for deploying the cluster.

[ceph: root@mostha1 /]# *ceph-volume lvm zap /dev/sdc*
--> Zapping: /dev/sdc
--> --destroy was not specified, but zapping a whole device will remove the partition table Running command: /usr/bin/dd if=/dev/zero of=/dev/sdc bs=1M count=10 conv=fsync
 stderr: 10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.663425 s, 15.8 MB/s
--> Zapping successful for: <Raw Device: /dev/sdc>


[ceph: root@mostha1 /]# *ceph-volume lvm create --bluestore --data /dev/sdc*
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 9f1eb8ee-41e6-4350-ad73-1be21234ec7c  stderr: 2023-10-02T16:09:29.855+0000 7fb4eb8c0700 -1 auth: unable to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory  stderr: 2023-10-02T16:09:29.855+0000 7fb4eb8c0700 -1 AuthRegistry(0x7fb4e405c4d8) no keyring found at /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx  stderr: 2023-10-02T16:09:29.856+0000 7fb4eb8c0700 -1 auth: unable to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory  stderr: 2023-10-02T16:09:29.856+0000 7fb4eb8c0700 -1 AuthRegistry(0x7fb4e40601d0) no keyring found at /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx  stderr: 2023-10-02T16:09:29.857+0000 7fb4eb8c0700 -1 auth: unable to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory  stderr: 2023-10-02T16:09:29.857+0000 7fb4eb8c0700 -1 AuthRegistry(0x7fb4eb8bee90) no keyring found at /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx  stderr: 2023-10-02T16:09:29.858+0000 7fb4e965c700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]  stderr: 2023-10-02T16:09:29.858+0000 7fb4e9e5d700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]  stderr: 2023-10-02T16:09:29.858+0000 7fb4e8e5b700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]  stderr: 2023-10-02T16:09:29.858+0000 7fb4eb8c0700 -1 monclient: authenticate NOTE: no keyring found; disabled cephx authentication  stderr: [errno 13] RADOS permission denied (error connecting to the cluster)
-->  RuntimeError: Unable to create a new OSD id

Any idea of what is wrong ?

Thanks

Patrick
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


I'm still trying to understand what can be wrong or how to debug this situation where Ceph cannot see the devices.

The device :dev/sdc exists:

   [root@mostha1 ~]# cephadm shell lsmcli ldl
   Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c
   Using recent ceph image
quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e    Path     | SCSI VPD 0x83    | Link Type | Serial Number | Health
   Status ceph osd purge 2 --force --yes-i-really-mean-it
-------------------------------------------------------------------------    /dev/sda | 50024e92039e4f1c | PATA/SATA | S2B5J90ZA10142 | Good    /dev/sdb | 50014ee0ad5953c9 | PATA/SATA | WD-WMAYP0982329 | Good    /dev/sdc | 50024e920387fa2c | PATA/SATA | S2B5J90ZA02494 | Good

But I cannot do anything with it:

   [root@mostha1 ~]# cephadm shell ceph orch device zap
   mostha1.legi.grenoble-inp.fr /dev/sdc --force
   Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c
   Using recent ceph image
quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e    Error EINVAL: Device path '/dev/sdc' not found on host
   'mostha1.legi.grenoble-inp.fr'

Since I moved from octopus to Pacific.

Patrick
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



_______________________________________________
ceph-users mailing list --ceph-users@xxxxxxx
To unsubscribe send an email toceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux