Problem with recreating OSD with disk that died previously

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

I run a ceph Nautilius cluster with 9 hosts and a total of 144osds. Recently osd.49 on host ceph4 died because of disk errors. ceph tried to restart the osd for 4 timess and then gave up. The cluster then rebalanced successfully. ceph -s now says

  cluster:
    id:     1234567
    health: HEALTH_WARN
            4 daemons have recently crashed

  services:
    mon: 3 daemons, quorum ceph2,ceph5,ceph8 (age 9h)
    mgr: ceph8(active, since 9w), standbys: ceph2, ceph5, ceph-admin
    mds: cephfsrz:1 {0=ceph1=up:active} 2 up:standby
    osd: 144 osds: 143 up (since 3d), 143 in (since 3d)....
...

So I got a new brand new disk and swapped it against the one that died. The old disk was /dev/sdb, after inserting the new one is /dev/sds. Next I ran ceph-volume which worked without problems after previous disk-failure situations. This time it does not work:

# ceph-volume lvm create --bluestore --osd-id 49 --data /dev/sds
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json
-->  RuntimeError: The osd ID 49 is already in use or does not exist.

The ceph-osd.49 log shows:
2022-04-25 17:23:52.698 7f27d3211c00 -1 ESC[0;31m ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-49: (2) No such file or directoryESC[0m

There is no osd.49 process running. Running a ls show the following:

# ls -l  /var/lib/ceph/osd/ceph-49/
lrwxrwxrwx 1 ceph ceph 93 Feb 21 08:52 block -> /dev/ceph-5d1acce2-ba98-4b4c-81bd-f52a3309161f/osd-block-f9443ebd-d004-45d0-ade0-d8ef0df2c0d3
-rw------- 1 ceph ceph 37 Feb 21 08:52 ceph_fsid
-rw------- 1 ceph ceph 37 Feb 21 08:52 fsid
-rw------- 1 ceph ceph 56 Feb 21 08:52 keyring
-rw------- 1 ceph ceph  6 Feb 21 08:52 ready
-rw------- 1 ceph ceph 10 Feb 21 08:52 type
-rw------- 1 ceph ceph  3 Feb 21 08:52 whoami

The block link seems to be wrong probably old, since I get an io error if I run

# dd if=/var/lib/ceph/osd/ceph-49/block of=/dev/null bs=1024k count=1
dd: error reading '/var/lib/ceph/osd/ceph-49/block': Input/output error

However a dd from /dev/sds works just fine.

Running pvs yields:

root@ceph4:~# pvs

/dev/ceph-5d1acce2-ba98-4b4c-81bd-f52a3309161f/osd-block-f9443ebd-d004-45d0-ade0-d8ef0df2c0d3: read failed after 0 of 4096 at 0: Input/output error

/dev/ceph-5d1acce2-ba98-4b4c-81bd-f52a3309161f/osd-block-f9443ebd-d004-45d0-ade0-d8ef0df2c0d3: read failed after 0 of 4096 at 4000761970688: Input/output error

/dev/ceph-5d1acce2-ba98-4b4c-81bd-f52a3309161f/osd-block-f9443ebd-d004-45d0-ade0-d8ef0df2c0d3: read failed after 0 of 4096 at 4000762028032: Input/output error

/dev/ceph-5d1acce2-ba98-4b4c-81bd-f52a3309161f/osd-block-f9443ebd-d004-45d0-ade0-d8ef0df2c0d3: read failed after 0 of 4096 at 4096: Input/output error PV VG Fmt Attr PSize PFree /dev/md1 system lvm2 a-- <110.32g <72.38g /dev/sda ceph-8dfa3747-9e1b-4eb7-adfe-1ed6ee76dfb5 lvm2 a-- <3.64t 0 /dev/sdc ceph-f2c81a29-1968-4c7c-9354-2d1ac71b361f lvm2 a-- <3.64t 0
...
...
/dev/sdq ceph-ac6dc03b-422d-49cd-978e-71d2585cdd24 lvm2 a-- <3.64t 0 /dev/sdr ceph-7b28ddc2-8500-487b-a693-51d711d26d40 lvm2 a-- <3.64t 0


So how could I proceed? It seems somehow lvm has a dangling pv, which was the old disk. How could I solve this issue?

Thanks
Rainer
--
Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse  1
56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke, Tel: +49261287 1312
PGP: http://www.uni-koblenz.de/~krienke/mypgp.html, Fax: +49261287 1001312
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux