Hi all,
ending with git bisect just now shows:
4fc6bc394dffaf3ad375ff29cbb0a3eb9e4dbefc is the first bad commit
commit 4fc6bc394dffaf3ad375ff29cbb0a3eb9e4dbefc
Author: Zack Cerza <zack@xxxxxxxxx>
Date: Tue May 17 11:29:02 2022 -0600
ceph-volume: Optionally consume loop devices
A similar proposal was rejected in #24765; I understand the logic
behind the rejection, but this will allow us to run Ceph clusters on
machines that lack disk resources for testing purposes. We just need to
make it impossible to accidentally enable, and make it clear it is
unsupported.
Signed-off-by: Zack Cerza <zack@xxxxxxxxxx>
(cherry picked from commit c7f017b21ade3762ba5b7b9688bed72c6b60dc0e)
.../ceph_volume/tests/util/test_device.py | 17 +++++++
src/ceph-volume/ceph_volume/util/device.py | 14 +++--
src/ceph-volume/ceph_volume/util/disk.py | 59
++++++++++++++++++----
3 files changed, 78 insertions(+), 12 deletions(-)
I will try to investigate next week but if some Ceph expert developpers
can have a look at this commit ;-)
Have a nice week-end
Patrick
Le 18/10/2023 à 13:48, Patrick Begou a écrit :
Hi all,
I'm trying to catch the faulty commit. I'm able to build Ceph from the
git repo in a fresh podman container but at this time, the lsblk
command returns nothing in my container.
In ceph containers lsblk works
So something is wrong with launching my podman container (or different
from launching ceph containers) and I cannot find what.
Any help about this step ?
Thanks
Patrick
Le 13/10/2023 à 09:18, Eugen Block a écrit :
Trying to resend with the attachment.
I can't really find anything suspicious, ceph-volume (16.2.11) does
recognize /dev/sdc though:
[2023-10-12 08:58:14,135][ceph_volume.process][INFO ] stdout
NAME="sdc" KNAME="sdc" PKNAME="" MAJ:MIN="8:32" FSTYPE=""
MOUNTPOINT="" LABEL="" UUID="" RO="0" RM="1" MODEL="SAMSUNG HE253GJ "
SIZE="232.9G" STATE="running" OWNER="root" GROUP="disk"
MODE="brw-rw----" ALIGNMENT="0" PHY-SEC="512" LOG-SEC="512" ROTA="1"
SCHED="mq-deadline" TYPE="disk" DISC-ALN="0" DISC-GRAN="0B"
DISC-MAX="0B" DISC-ZERO="0" PKNAME="" PARTLABEL=""
[2023-10-12 08:58:14,139][ceph_volume.util.system][INFO ] Executable
pvs found on the host, will use /sbin/pvs
[2023-10-12 08:58:14,140][ceph_volume.process][INFO ] Running
command: nsenter --mount=/rootfs/proc/1/ns/mnt
--ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net
--uts=/rootfs/proc/1/ns/uts /sbin/pvs --noheadings --readonly
--units=b --nosuffix --separator=";" -o
pv_name,vg_name,pv_count,lv_count,vg_attr,vg_extent_count,vg_free_count,vg_extent_size
But apparently it just stops after that. I already tried to find a
debug log-level for ceph-volume but it's not applicable to all
subcommands.
The cephadm.log also just stops without even finishing the "copying
blob", which makes me wonder if it actually pulls the entire image? I
assume you have enough free disk space (otherwise I would expect a
message "failed to pull target image"), do you see any other warnings
in syslog or something? Or are the logs incomplete?
Maybe someone else finds any clues in the logs...
Regards,
Eugen
Zitat von Patrick Begou <Patrick.Begou@xxxxxxxxxxxxxxxxxxxxxx>:
Hi Eugen,
You will find in attachment cephadm.log and cepĥ-volume.log. Each
contains the outputs for the 2 versions. v16.2.10-20220920 is really
more verbose or v16.2.11-20230125 does not execute all the detection
process
Patrick
Le 12/10/2023 à 09:34, Eugen Block a écrit :
Good catch, and I found the thread I had in my mind, it was this
exact one. :-D Anyway, can you share the ceph-volume.log from the
working and the not working attempt?
I tried to look for something significant in the pacific release
notes for 16.2.11, and there were some changes to ceph-volume, but
I'm not sure what it could be.
Zitat von Patrick Begou <Patrick.Begou@xxxxxxxxxxxxxxxxxxxxxx>:
I've ran additional tests with Pacific releases and with
"ceph-volume inventory" things went wrong with the first v16.11
release (v16.2.11-20230125)
=================== Ceph v16.2.10-20220920 =======================
Device Path Size rotates available Model name
/dev/sdc 232.83 GB True True SAMSUNG HE253GJ
/dev/sda 232.83 GB True False SAMSUNG HE253GJ
/dev/sdb 465.76 GB True False WDC WD5003ABYX-1
=================== Ceph v16.2.11-20230125 =======================
Device Path Size Device nodes rotates
available Model name
May be this could help to see what has changed ?
Patrick
Le 11/10/2023 à 17:38, Eugen Block a écrit :
That's really strange. Just out of curiosity, have you tried
Quincy (and/or Reef) as well? I don't recall what inventory does
in the background exactly, I believe Adam King mentioned that in
some thread, maybe that can help here. I'll search for that
thread tomorrow.
Zitat von Patrick Begou <Patrick.Begou@xxxxxxxxxxxxxxxxxxxxxx>:
Hi Eugen,
[root@mostha1 ~]# rpm -q cephadm
cephadm-16.2.14-0.el8.noarch
Log associated to the
2023-10-11 16:16:02,167 7f820515fb80 DEBUG
--------------------------------------------------------------------------------
cephadm ['gather-facts']
2023-10-11 16:16:02,208 7f820515fb80 DEBUG /bin/podman: 4.4.1
2023-10-11 16:16:02,313 7f820515fb80 DEBUG sestatus: SELinux
status: disabled
2023-10-11 16:16:02,317 7f820515fb80 DEBUG sestatus: SELinux
status: disabled
2023-10-11 16:16:02,322 7f820515fb80 DEBUG sestatus: SELinux
status: disabled
2023-10-11 16:16:02,326 7f820515fb80 DEBUG sestatus: SELinux
status: disabled
2023-10-11 16:16:02,329 7f820515fb80 DEBUG sestatus: SELinux
status: disabled
2023-10-11 16:16:02,333 7f820515fb80 DEBUG sestatus: SELinux
status: disabled
2023-10-11 16:16:04,474 7ff2a5c08b80 DEBUG
--------------------------------------------------------------------------------
cephadm ['ceph-volume', 'inventory']
2023-10-11 16:16:04,516 7ff2a5c08b80 DEBUG /usr/bin/podman: 4.4.1
2023-10-11 16:16:04,520 7ff2a5c08b80 DEBUG Using default config:
/etc/ceph/ceph.conf
2023-10-11 16:16:04,573 7ff2a5c08b80 DEBUG /usr/bin/podman:
0d28d71358d7,445.8MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman:
2084faaf4d54,13.27MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman:
61073c53805d,512.7MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman:
6b9f0b72d668,361.1MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman:
7493a28808ad,163.7MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman:
a89672a3accf,59.22MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman:
b45271cc9726,54.24MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman:
e00ec13ab138,707.3MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman:
fcb1e1a6b08d,35.55MB / 50.32GB
2023-10-11 16:16:04,630 7ff2a5c08b80 DEBUG /usr/bin/podman:
0d28d71358d7,1.28%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman:
2084faaf4d54,0.00%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman:
61073c53805d,1.19%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman:
6b9f0b72d668,1.03%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman:
7493a28808ad,0.78%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman:
a89672a3accf,0.11%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman:
b45271cc9726,1.35%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman:
e00ec13ab138,0.43%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman:
fcb1e1a6b08d,0.02%
2023-10-11 16:16:04,634 7ff2a5c08b80 INFO Inferring fsid
250f9864-0142-11ee-8e5f-00266cf8869c
2023-10-11 16:16:04,691 7ff2a5c08b80 DEBUG /usr/bin/podman:
quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e
2023-10-11 16:16:04,692 7ff2a5c08b80 DEBUG /usr/bin/podman:
quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca
2023-10-11 16:16:04,692 7ff2a5c08b80 DEBUG /usr/bin/podman:
docker.io/ceph/ceph@sha256:056637972a107df4096f10951e4216b21fcd8ae0b9fb4552e628d35df3f61139
2023-10-11 16:16:04,694 7ff2a5c08b80 INFO Using recent ceph
image
quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e
2023-10-11 16:16:05,094 7ff2a5c08b80 DEBUG stat: 167 167
2023-10-11 16:16:05,903 7ff2a5c08b80 DEBUG Acquiring lock
140679815723776 on
/run/cephadm/250f9864-0142-11ee-8e5f-00266cf8869c.lock
2023-10-11 16:16:05,903 7ff2a5c08b80 DEBUG Lock 140679815723776
acquired on /run/cephadm/250f9864-0142-11ee-8e5f-00266cf8869c.lock
2023-10-11 16:16:05,929 7ff2a5c08b80 DEBUG sestatus: SELinux
status: disabled
2023-10-11 16:16:05,933 7ff2a5c08b80 DEBUG sestatus: SELinux
status: disabled
2023-10-11 16:16:06,700 7ff2a5c08b80 DEBUG /usr/bin/podman:
2023-10-11 16:16:06,701 7ff2a5c08b80 DEBUG /usr/bin/podman:
Device Path Size Device nodes rotates available
Model name
I have only one version of cephadm in /var/lib/ceph/{fsid} :
[root@mostha1 ~]# ls -lrt
/var/lib/ceph/250f9864-0142-11ee-8e5f-00266cf8869c/cephadm*
-rw-r--r-- 1 root root 350889 28 sept. 16:39
/var/lib/ceph/250f9864-0142-11ee-8e5f-00266cf8869c/cephadm.f6868821c084cd9740b59c7c5eb59f0dd47f6e3b1e6fecb542cb44134ace8d78
Running " python3
/var/lib/ceph/250f9864-0142-11ee-8e5f-00266cf8869c/cephadm.f6868821c084cd9740b59c7c5eb59f0dd47f6e3b1e6fecb542cb44134ace8d78
ceph-volume inventory" give the same output and the same log
(execpt the valu of the lock):
2023-10-11 16:21:35,965 7f467cf31b80 DEBUG
--------------------------------------------------------------------------------
cephadm ['ceph-volume', 'inventory']
2023-10-11 16:21:36,009 7f467cf31b80 DEBUG /usr/bin/podman: 4.4.1
2023-10-11 16:21:36,012 7f467cf31b80 DEBUG Using default config:
/etc/ceph/ceph.conf
2023-10-11 16:21:36,067 7f467cf31b80 DEBUG /usr/bin/podman:
0d28d71358d7,452.1MB / 50.32GB
2023-10-11 16:21:36,067 7f467cf31b80 DEBUG /usr/bin/podman:
2084faaf4d54,13.27MB / 50.32GB
2023-10-11 16:21:36,067 7f467cf31b80 DEBUG /usr/bin/podman:
61073c53805d,513.6MB / 50.32GB
2023-10-11 16:21:36,067 7f467cf31b80 DEBUG /usr/bin/podman:
6b9f0b72d668,322.4MB / 50.32GB
2023-10-11 16:21:36,067 7f467cf31b80 DEBUG /usr/bin/podman:
7493a28808ad,164MB / 50.32GB
2023-10-11 16:21:36,067 7f467cf31b80 DEBUG /usr/bin/podman:
a89672a3accf,58.5MB / 50.32GB
2023-10-11 16:21:36,067 7f467cf31b80 DEBUG /usr/bin/podman:
b45271cc9726,54.69MB / 50.32GB
2023-10-11 16:21:36,067 7f467cf31b80 DEBUG /usr/bin/podman:
e00ec13ab138,707.1MB / 50.32GB
2023-10-11 16:21:36,068 7f467cf31b80 DEBUG /usr/bin/podman:
fcb1e1a6b08d,36.28MB / 50.32GB
2023-10-11 16:21:36,125 7f467cf31b80 DEBUG /usr/bin/podman:
0d28d71358d7,1.27%
2023-10-11 16:21:36,125 7f467cf31b80 DEBUG /usr/bin/podman:
2084faaf4d54,0.00%
2023-10-11 16:21:36,125 7f467cf31b80 DEBUG /usr/bin/podman:
61073c53805d,1.16%
2023-10-11 16:21:36,125 7f467cf31b80 DEBUG /usr/bin/podman:
6b9f0b72d668,1.02%
2023-10-11 16:21:36,125 7f467cf31b80 DEBUG /usr/bin/podman:
7493a28808ad,0.78%
2023-10-11 16:21:36,125 7f467cf31b80 DEBUG /usr/bin/podman:
a89672a3accf,0.11%
2023-10-11 16:21:36,125 7f467cf31b80 DEBUG /usr/bin/podman:
b45271cc9726,1.35%
2023-10-11 16:21:36,125 7f467cf31b80 DEBUG /usr/bin/podman:
e00ec13ab138,0.41%
2023-10-11 16:21:36,125 7f467cf31b80 DEBUG /usr/bin/podman:
fcb1e1a6b08d,0.02%
2023-10-11 16:21:36,128 7f467cf31b80 INFO Inferring fsid
250f9864-0142-11ee-8e5f-00266cf8869c
2023-10-11 16:21:36,186 7f467cf31b80 DEBUG /usr/bin/podman:
quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e
2023-10-11 16:21:36,187 7f467cf31b80 DEBUG /usr/bin/podman:
quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca
2023-10-11 16:21:36,187 7f467cf31b80 DEBUG /usr/bin/podman:
docker.io/ceph/ceph@sha256:056637972a107df4096f10951e4216b21fcd8ae0b9fb4552e628d35df3f61139
2023-10-11 16:21:36,189 7f467cf31b80 INFO Using recent ceph
image
quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e
2023-10-11 16:21:36,549 7f467cf31b80 DEBUG stat: 167 167
2023-10-11 16:21:36,942 7f467cf31b80 DEBUG Acquiring lock
139940396923424 on
/run/cephadm/250f9864-0142-11ee-8e5f-00266cf8869c.lock
2023-10-11 16:21:36,942 7f467cf31b80 DEBUG Lock 139940396923424
acquired on /run/cephadm/250f9864-0142-11ee-8e5f-00266cf8869c.lock
2023-10-11 16:21:36,969 7f467cf31b80 DEBUG sestatus: SELinux
status: disabled
2023-10-11 16:21:36,972 7f467cf31b80 DEBUG sestatus: SELinux
status: disabled
2023-10-11 16:21:37,749 7f467cf31b80 DEBUG /usr/bin/podman:
2023-10-11 16:21:37,750 7f467cf31b80 DEBUG /usr/bin/podman:
Device Path Size Device nodes rotates available
Model name
Patrick
Le 11/10/2023 à 15:59, Eugen Block a écrit :
Can you check which cephadm version is installed on the host?
And then please add (only the relevant) output from the
cephadm.log when you run the inventory (without the --image
<octopus>). Sometimes the version mismatch on the host and the
one the orchestrator uses can cause some disruptions. You could
try the same with the latest cephadm you have in
/var/lib/ceph/${fsid}/ (ls -lrt
/var/lib/ceph/${fsid}/cephadm.*). I mentioned that in this
thread [1]. So you could try the following:
$ chmod +x /var/lib/ceph/{fsid}/cephadm.{latest}
$ python3 /var/lib/ceph/{fsid}/cephadm.{latest} ceph-volume
inventory
Does the output differ? Paste the relevant cephadm.log from
that attempt as well.
[1]
https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/LASBJCSPFGDYAWPVE2YLV2ZLF3HC5SLS/
Zitat von Patrick Begou <Patrick.Begou@xxxxxxxxxxxxxxxxxxxxxx>:
Hi Eugen,
first many thanks for the time spent on this problem.
"ceph osd purge 2 --force --yes-i-really-mean-it" works and
clean all the bas status.
*[root@mostha1 ~]# cephadm shell
*Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c
Using recent ceph image
quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e
*
*
*[ceph: root@mostha1 /]# ceph osd purge 2 --force
--yes-i-really-mean-it *
purged osd.2
*
*
*[ceph: root@mostha1 /]# ceph osd tree*
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 1.72823 root default
-5 0.45477 host dean
0 hdd 0.22739 osd.0 up 1.00000 1.00000
4 hdd 0.22739 osd.4 up 1.00000 1.00000
-9 0.22739 host ekman
6 hdd 0.22739 osd.6 up 1.00000 1.00000
-7 0.45479 host mostha1
5 hdd 0.45479 osd.5 up 1.00000 1.00000
-3 0.59128 host mostha2
1 hdd 0.22739 osd.1 up 1.00000 1.00000
3 hdd 0.36389 osd.3 up 1.00000 1.00000
*
*
*[ceph: root@mostha1 /]# lsblk*
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 1 232.9G 0 disk
|-sda1 8:1 1 3.9G 0 part /rootfs/boot
|-sda2 8:2 1 3.9G 0 part [SWAP]
`-sda3 8:3 1 225G 0 part
|-al8vg-rootvol 253:0 0 48.8G 0 lvm /rootfs
|-al8vg-homevol 253:2 0 9.8G 0 lvm /rootfs/home
|-al8vg-tmpvol 253:3 0 9.8G 0 lvm /rootfs/tmp
`-al8vg-varvol 253:4 0 19.8G 0 lvm /rootfs/var
sdb 8:16 1 465.8G 0 disk
`-ceph--08827fdc--136e--4070--97e9--e5e8b3970766-osd--block--7dec1808--d6f4--4f90--ac74--75a4346e1df5
253:1 0 465.8G 0 lvm
sdc 8:32 1 232.9G 0 disk
"cephadm ceph-volume inventory" returns nothing:
*[root@mostha1 ~]# cephadm ceph-volume inventory **
*Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c
Using recent ceph image
quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e
Device Path Size Device nodes rotates
available Model name
[root@mostha1 ~]#
But running the same command within cephadm 15.2.17 works:
*[root@mostha1 ~]# cephadm --image 93146564743f ceph-volume
inventory*
Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c
Device Path Size rotates available Model
name
/dev/sdc 232.83 GB True True SAMSUNG HE253GJ
/dev/sda 232.83 GB True False SAMSUNG HE253GJ
/dev/sdb 465.76 GB True False WDC
WD5003ABYX-1
[root@mostha1 ~]#
*[root@mostha1 ~]# podman images -a**
*REPOSITORY TAG IMAGE ID CREATED
SIZE
quay.io/ceph/ceph v16.2.14 f13d80acdbb5 2
weeks ago 1.21 GB
quay.io/ceph/ceph v15.2.17 93146564743f 14
months ago 1.24 GB
....
Patrick
Le 11/10/2023 à 15:14, Eugen Block a écrit :
Your response is a bit confusing since it seems to be mixed
up with the previous answer. So you still need to remove the
OSD properly, so purge it from the crush tree:
ceph osd purge 2 --force --yes-i-really-mean-it (only in a
test cluster!)
If everything is clean (OSD has been removed, disk has been
zapped, lsblk shows no LVs for that disk) you can check the
inventory:
cephadm ceph-volume inventory
Please also add the output of 'ceph orch ls osd --export'.
Zitat von Patrick Begou <Patrick.Begou@xxxxxxxxxxxxxxxxxxxxxx>:
Hi Eugen,
- the OS is Alma Linux 8 with latests updates.
- this morning I've worked with ceph-volume but it ends with
a strange final state. I was connected on host mostha1 where
/dev/sdc was not reconized. These are the steps followed
based on the ceph-volume documentation I've read:
[root@mostha1 ~]# cephadm shell
[ceph: root@mostha1 /]# ceph auth get client.bootstrap-osd >
/var/lib/ceph/bootstrap-osd/ceph.keyring
[ceph: root@mostha1 /]# ceph-volume lvm prepare --bluestore
--data /dev/sdc
Now lsblk command shows sdc as an osd:
....
sdb 8:16 1 465.8G 0 disk
`-ceph--08827fdc--136e--4070--97e9--e5e8b3970766-osd--block--7dec1808--d6f4--4f90--ac74--75a4346e1df5
253:1 0 465.8G 0 lvm
sdc 8:32 1 232.9G 0 disk
`-ceph--b27d7a07--278d--4ee2--b84e--53256ef8de4c-osd--block--45c8e92c--caf9--4fe7--9a42--7b45a0794632
253:5 0 232.8G 0 lvm
Then I've tried to activate this osd but it fails as in
podman I have not access to systemctl:
[ceph: root@mostha1 /]# ceph-volume lvm activate 2
45c8e92c-caf9-4fe7-9a42-7b45a0794632
.....
Running command: /usr/bin/systemctl start ceph-osd@2
stderr: Failed to connect to bus: No such file or directory
--> RuntimeError: command returned non-zero exit status: 1
[ceph: root@mostha1 /]# ceph osd tree
And now I have now I have a strange status for this osd.2:
[ceph: root@mostha1 /]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 1.72823 root default
-5 0.45477 host dean
0 hdd 0.22739 osd.0 up 1.00000 1.00000
4 hdd 0.22739 osd.4 up 1.00000 1.00000
-9 0.22739 host ekman
6 hdd 0.22739 osd.6 up 1.00000 1.00000
-7 0.45479 host mostha1
5 hdd 0.45479 osd.5 up 1.00000 1.00000
-3 0.59128 host mostha2
1 hdd 0.22739 osd.1 up 1.00000 1.00000
3 hdd 0.36389 osd.3 up 1.00000 1.00000
2 0 osd.2 down 0 1.00000
I've tried to destroy the osd as you suggest but even if the
command returns no error I still have this osd even if
"lsblk" do not show any more /dev/sdc as a ceph osd device.
*[ceph: root@mostha1 /]# ceph-volume lvm zap --destroy
/dev/sdc**
*--> Zapping: /dev/sdc
--> Zapping lvm member /dev/sdc. lv_path is
/dev/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c/osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632
--> Unmounting /var/lib/ceph/osd/ceph-2
Running command: /usr/bin/umount -v /var/lib/ceph/osd/ceph-2
stderr: umount: /var/lib/ceph/osd/ceph-2 unmounted
Running command: /usr/bin/dd if=/dev/zero
of=/dev/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c/osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632
bs=1M count=10 conv=fsync
stderr: 10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.575633 s, 18.2 MB/s
--> Only 1 LV left in VG, will proceed to destroy volume
group ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c
Running command: nsenter --mount=/rootfs/proc/1/ns/mnt
--ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net
--uts=/rootfs/proc/1/ns/uts /sbin/vgremove -v -f
ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c
stderr: Removing
ceph--b27d7a07--278d--4ee2--b84e--53256ef8de4c-osd--block--45c8e92c--caf9--4fe7--9a42--7b45a0794632
(253:1)
stderr: Releasing logical volume
"osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632"
stderr: Archiving volume group
"ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c" metadata (seqno 5).
stdout: Logical volume
"osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632"
successfully removed.
stderr: Removing physical volume "/dev/sdc" from volume
group "ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c"
stdout: Volume group
"ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c" successfully
removed
stderr: Creating volume group backup
"/etc/lvm/backup/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c"
(seqno 6).
Running command: nsenter --mount=/rootfs/proc/1/ns/mnt
--ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net
--uts=/rootfs/proc/1/ns/uts /sbin/pvremove -v -f -f /dev/sdc
stdout: Labels on physical volume "/dev/sdc" successfully
wiped.
Running command: /usr/bin/dd if=/dev/zero of=/dev/sdc bs=1M
count=10 conv=fsync
stderr: 10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.590652 s, 17.8 MB/s
*--> Zapping successful for: <Raw Device: /dev/sdc>*
*
*
*[ceph: root@mostha1 /]# ceph osd tree**
*ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 1.72823 root default
-5 0.45477 host dean
0 hdd 0.22739 osd.0 up 1.00000 1.00000
4 hdd 0.22739 osd.4 up 1.00000 1.00000
-9 0.22739 host ekman
6 hdd 0.22739 osd.6 up 1.00000 1.00000
-7 0.45479 host mostha1
5 hdd 0.45479 osd.5 up 1.00000 1.00000
-3 0.59128 host mostha2
1 hdd 0.22739 osd.1 up 1.00000 1.00000
3 hdd 0.36389 osd.3 up 1.00000 1.00000
2 0 osd.2 down 0 1.00000
*
*
*[ceph: root@mostha1 /]# lsblk**
*NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 1 232.9G 0 disk
|-sda1 8:1 1 3.9G 0 part /rootfs/boot
|-sda2 8:2 1 3.9G 0 part [SWAP]
`-sda3 8:3 1 225G 0 part
|-al8vg-rootvol 253:0 0 48.8G 0 lvm /rootfs
|-al8vg-homevol 253:3 0 9.8G 0 lvm /rootfs/home
|-al8vg-tmpvol 253:4 0 9.8G 0 lvm /rootfs/tmp
`-al8vg-varvol 253:5 0 19.8G 0 lvm /rootfs/var
sdb 8:16 1 465.8G 0 disk
`-ceph--08827fdc--136e--4070--97e9--e5e8b3970766-osd--block--7dec1808--d6f4--4f90--ac74--75a4346e1df5
253:2 0 465.8G 0 lvm
*sdc *
Patrick
Le 11/10/2023 à 11:00, Eugen Block a écrit :
Hi,
just wondering if 'ceph-volume lvm zap --destroy /dev/sdc'
would help here. From your previous output you didn't
specify the --destroy flag.
Which cephadm version is installed on the host? Did you
also upgrade the OS when moving to Pacific? (Sorry if I
missed that.
Zitat von Patrick Begou
<Patrick.Begou@xxxxxxxxxxxxxxxxxxxxxx>:
Le 02/10/2023 à 18:22, Patrick Bégou a écrit :
Hi all,
still stuck with this problem.
I've deployed octopus and all my HDD have been setup as
osd. Fine.
I've upgraded to pacific and 2 osd have failed. They have
been automatically removed and upgrade finishes. Cluster
Health is finaly OK, no data loss.
But now I cannot re-add these osd with pacific (I had
previous troubles on these old HDDs, lost one osd in
octopus and was able to reset and re-add it).
I've tried manually to add the first osd on the node
where it is located, following
https://docs.ceph.com/en/pacific/rados/operations/bluestore-migration/
(not sure it's the best idea...) but it fails too. This
node was the one used for deploying the cluster.
[ceph: root@mostha1 /]# *ceph-volume lvm zap /dev/sdc*
--> Zapping: /dev/sdc
--> --destroy was not specified, but zapping a whole
device will remove the partition table
Running command: /usr/bin/dd if=/dev/zero of=/dev/sdc
bs=1M count=10 conv=fsync
stderr: 10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.663425 s, 15.8 MB/s
--> Zapping successful for: <Raw Device: /dev/sdc>
[ceph: root@mostha1 /]# *ceph-volume lvm create
--bluestore --data /dev/sdc*
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name
client.bootstrap-osd --keyring
/var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new
9f1eb8ee-41e6-4350-ad73-1be21234ec7c
stderr: 2023-10-02T16:09:29.855+0000 7fb4eb8c0700 -1
auth: unable to find a keyring on
/var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such
file or directory
stderr: 2023-10-02T16:09:29.855+0000 7fb4eb8c0700 -1
AuthRegistry(0x7fb4e405c4d8) no keyring found at
/var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
stderr: 2023-10-02T16:09:29.856+0000 7fb4eb8c0700 -1
auth: unable to find a keyring on
/var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such
file or directory
stderr: 2023-10-02T16:09:29.856+0000 7fb4eb8c0700 -1
AuthRegistry(0x7fb4e40601d0) no keyring found at
/var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
stderr: 2023-10-02T16:09:29.857+0000 7fb4eb8c0700 -1
auth: unable to find a keyring on
/var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such
file or directory
stderr: 2023-10-02T16:09:29.857+0000 7fb4eb8c0700 -1
AuthRegistry(0x7fb4eb8bee90) no keyring found at
/var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
stderr: 2023-10-02T16:09:29.858+0000 7fb4e965c700 -1
monclient(hunting): handle_auth_bad_method server
allowed_methods [2] but i only support [1]
stderr: 2023-10-02T16:09:29.858+0000 7fb4e9e5d700 -1
monclient(hunting): handle_auth_bad_method server
allowed_methods [2] but i only support [1]
stderr: 2023-10-02T16:09:29.858+0000 7fb4e8e5b700 -1
monclient(hunting): handle_auth_bad_method server
allowed_methods [2] but i only support [1]
stderr: 2023-10-02T16:09:29.858+0000 7fb4eb8c0700 -1
monclient: authenticate NOTE: no keyring found; disabled
cephx authentication
stderr: [errno 13] RADOS permission denied (error
connecting to the cluster)
--> RuntimeError: Unable to create a new OSD id
Any idea of what is wrong ?
Thanks
Patrick
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
I'm still trying to understand what can be wrong or how to
debug this situation where Ceph cannot see the devices.
The device :dev/sdc exists:
[root@mostha1 ~]# cephadm shell lsmcli ldl
Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c
Using recent ceph image
quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e
Path | SCSI VPD 0x83 | Link Type | Serial Number
| Health
Status ceph osd purge 2 --force --yes-i-really-mean-it
-------------------------------------------------------------------------
/dev/sda | 50024e92039e4f1c | PATA/SATA |
S2B5J90ZA10142 | Good
/dev/sdb | 50014ee0ad5953c9 | PATA/SATA |
WD-WMAYP0982329 | Good
/dev/sdc | 50024e920387fa2c | PATA/SATA |
S2B5J90ZA02494 | Good
But I cannot do anything with it:
[root@mostha1 ~]# cephadm shell ceph orch device zap
mostha1.legi.grenoble-inp.fr /dev/sdc --force
Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c
Using recent ceph image
quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e
Error EINVAL: Device path '/dev/sdc' not found on host
'mostha1.legi.grenoble-inp.fr'
Since I moved from octopus to Pacific.
Patrick
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx