Disk got corrupted, it might be dead. Check kernel log for errors and SMART reallocated sector count or errors.
If the disk is still good: simply re-create the OSD.
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
On Fri, May 24, 2019 at 3:51 PM Guillaume Chenuet <guillaume.chenuet@xxxxxxxxxxxxx> wrote:
Hi,_______________________________________________We are running a Ceph cluster with 36 OSD splitted on 3 servers (12 OSD per server) and Ceph version 12.2.11 (26dc3775efc7bb286a1d6d66faee0ba30ea23eee) luminous (stable).This cluster is used by an OpenStack private cloud and deployed with OpenStack Kolla. Every OSD ran into a Docker container on the server and MON, MGR, MDS, and RGW are running on 3 other servers.This week, one OSD crashed and failed to restart, with this stack trace:Running command: '/usr/bin/ceph-osd -f --public-addr 10.106.142.30 --cluster-addr 10.106.142.30 -i 35'+ exec /usr/bin/ceph-osd -f --public-addr 10.106.142.30 --cluster-addr 10.106.142.30 -i 35
starting osd.35 at - osd_data /var/lib/ceph/osd/ceph-35 /var/lib/ceph/osd/ceph-35/journal
/builddir/build/BUILD/ceph-12.2.11/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, uint64_t, size_t, ceph::bufferlist*, char*)' thread 7efd088d6d80 time 2019-05-24 05:40:47.799918
/builddir/build/BUILD/ceph-12.2.11/src/os/bluestore/BlueFS.cc: 1000: FAILED assert(r == 0)
ceph version 12.2.11 (26dc3775efc7bb286a1d6d66faee0ba30ea23eee) luminous (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x556f7833f8f0]
2: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned long, unsigned long, ceph::buffer::list*, char*)+0xca4) [0x556f782b5574]
3: (BlueFS::_replay(bool)+0x2ef) [0x556f782c82af]
4: (BlueFS::mount()+0x1d4) [0x556f782cc014]
5: (BlueStore::_open_db(bool)+0x1847) [0x556f781e0ce7]
6: (BlueStore::_mount(bool)+0x40e) [0x556f782126ae]
7: (OSD::init()+0x3bd) [0x556f77dbbaed]
8: (main()+0x2d07) [0x556f77cbe667]
9: (__libc_start_main()+0xf5) [0x7efd04fa63d5]
10: (()+0x4c1f73) [0x556f77d5ef73]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
*** Caught signal (Aborted) **
in thread 7efd088d6d80 thread_name:ceph-osd
ceph version 12.2.11 (26dc3775efc7bb286a1d6d66faee0ba30ea23eee) luminous (stable)
1: (()+0xa63931) [0x556f78300931]
2: (()+0xf5d0) [0x7efd05f995d0]
3: (gsignal()+0x37) [0x7efd04fba207]
4: (abort()+0x148) [0x7efd04fbb8f8]
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x284) [0x556f7833fa64]
6: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned long, unsigned long, ceph::buffer::list*, char*)+0xca4) [0x556f782b5574]
7: (BlueFS::_replay(bool)+0x2ef) [0x556f782c82af]
8: (BlueFS::mount()+0x1d4) [0x556f782cc014]
9: (BlueStore::_open_db(bool)+0x1847) [0x556f781e0ce7]
10: (BlueStore::_mount(bool)+0x40e) [0x556f782126ae]
11: (OSD::init()+0x3bd) [0x556f77dbbaed]
12: (main()+0x2d07) [0x556f77cbe667]
13: (__libc_start_main()+0xf5) [0x7efd04fa63d5]
14: (()+0x4c1f73) [0x556f77d5ef73]The cluster health is OK and Ceph sees this OSD as shutdown.I tried to find more information on the internet about this error without luck.Do you have any idea or input about this error, please?Thanks,Guillaume
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com