Hello,
I run a ceph Nautilius cluster with 9 hosts and a total of 144osds.
Recently osd.49 on host ceph4 died because of disk errors. ceph tried
to restart the osd for 4 timess and then gave up. The cluster then
rebalanced successfully. ceph -s now says
cluster:
id: 1234567
health: HEALTH_WARN
4 daemons have recently crashed
services:
mon: 3 daemons, quorum ceph2,ceph5,ceph8 (age 9h)
mgr: ceph8(active, since 9w), standbys: ceph2, ceph5, ceph-admin
mds: cephfsrz:1 {0=ceph1=up:active} 2 up:standby
osd: 144 osds: 143 up (since 3d), 143 in (since 3d)....
...
So I got a new brand new disk and swapped it against the one that died.
The old disk was /dev/sdb, after inserting the new one is /dev/sds. Next
I ran ceph-volume which worked without problems after previous
disk-failure situations. This time it does not work:
# ceph-volume lvm create --bluestore --osd-id 49 --data /dev/sds
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name
client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring
osd tree -f json
--> RuntimeError: The osd ID 49 is already in use or does not exist.
The ceph-osd.49 log shows:
2022-04-25 17:23:52.698 7f27d3211c00 -1 ESC[0;31m ** ERROR: unable to
open OSD superblock on /var/lib/ceph/osd/ceph-49: (2) No such file or
directoryESC[0m
There is no osd.49 process running. Running a ls show the following:
# ls -l /var/lib/ceph/osd/ceph-49/
lrwxrwxrwx 1 ceph ceph 93 Feb 21 08:52 block ->
/dev/ceph-5d1acce2-ba98-4b4c-81bd-f52a3309161f/osd-block-f9443ebd-d004-45d0-ade0-d8ef0df2c0d3
-rw------- 1 ceph ceph 37 Feb 21 08:52 ceph_fsid
-rw------- 1 ceph ceph 37 Feb 21 08:52 fsid
-rw------- 1 ceph ceph 56 Feb 21 08:52 keyring
-rw------- 1 ceph ceph 6 Feb 21 08:52 ready
-rw------- 1 ceph ceph 10 Feb 21 08:52 type
-rw------- 1 ceph ceph 3 Feb 21 08:52 whoami
The block link seems to be wrong probably old, since I get an io error
if I run
# dd if=/var/lib/ceph/osd/ceph-49/block of=/dev/null bs=1024k count=1
dd: error reading '/var/lib/ceph/osd/ceph-49/block': Input/output error
However a dd from /dev/sds works just fine.
Running pvs yields:
root@ceph4:~# pvs
/dev/ceph-5d1acce2-ba98-4b4c-81bd-f52a3309161f/osd-block-f9443ebd-d004-45d0-ade0-d8ef0df2c0d3:
read failed after 0 of 4096 at 0: Input/output error
/dev/ceph-5d1acce2-ba98-4b4c-81bd-f52a3309161f/osd-block-f9443ebd-d004-45d0-ade0-d8ef0df2c0d3:
read failed after 0 of 4096 at 4000761970688: Input/output error
/dev/ceph-5d1acce2-ba98-4b4c-81bd-f52a3309161f/osd-block-f9443ebd-d004-45d0-ade0-d8ef0df2c0d3:
read failed after 0 of 4096 at 4000762028032: Input/output error
/dev/ceph-5d1acce2-ba98-4b4c-81bd-f52a3309161f/osd-block-f9443ebd-d004-45d0-ade0-d8ef0df2c0d3:
read failed after 0 of 4096 at 4096: Input/output error
PV VG Fmt Attr PSize
PFree
/dev/md1 system lvm2 a--
<110.32g <72.38g
/dev/sda ceph-8dfa3747-9e1b-4eb7-adfe-1ed6ee76dfb5 lvm2 a--
<3.64t 0
/dev/sdc ceph-f2c81a29-1968-4c7c-9354-2d1ac71b361f lvm2 a--
<3.64t 0
...
...
/dev/sdq ceph-ac6dc03b-422d-49cd-978e-71d2585cdd24 lvm2 a--
<3.64t 0
/dev/sdr ceph-7b28ddc2-8500-487b-a693-51d711d26d40 lvm2 a--
<3.64t 0
So how could I proceed? It seems somehow lvm has a dangling pv, which
was the old disk. How could I solve this issue?
Thanks
Rainer
--
Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1
56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke, Tel: +49261287 1312
PGP: http://www.uni-koblenz.de/~krienke/mypgp.html, Fax: +49261287
1001312
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx