I also have a similar problem in my case OSD's starts and stops after a few mins and not much in the log. I have filed a bug waiting for a reply to confirm it's a bug or an issue. On Fri, Sep 3, 2021 at 5:21 PM mahnoosh shahidi <mahnooosh.shd@xxxxxxxxx> wrote: > We still have this problem. Does anybody have any ideas about this? > > On Mon, Aug 23, 2021 at 9:53 AM mahnoosh shahidi <mahnooosh.shd@xxxxxxxxx> > wrote: > > > Hi everyone, > > > > We have a problem with octopus 15.2.12. osds randomly crash and restart > > with the following traceback log. > > > > -8> 2021-08-20T15:01:03.165+0430 7f2d10fd7700 10 monclient: > > handle_auth_request added challenge on 0x55a3fc654400 > > -7> 2021-08-20T15:01:03.201+0430 7f2d02960700 2 osd.202 1145364 > > ms_handle_reset con 0x55a548087000 session 0x55a4be8a4940 > > -6> 2021-08-20T15:01:03.209+0430 7f2d02960700 2 osd.202 1145364 > > ms_handle_reset con 0x55a52aab2800 session 0x55a4497dd0c0 > > -5> 2021-08-20T15:01:03.213+0430 7f2d02960700 2 osd.202 1145364 > > ms_handle_reset con 0x55a548084800 session 0x55a3fca0f860 > > -4> 2021-08-20T15:01:03.217+0430 7f2d02960700 2 osd.202 1145364 > > ms_handle_reset con 0x55a3c5e50800 session 0x55a51c1b7680 > > -3> 2021-08-20T15:01:03.217+0430 7f2d02960700 2 osd.202 1145364 > > ms_handle_reset con 0x55a3c5e52000 session 0x55a4055932a0 > > -2> 2021-08-20T15:01:03.225+0430 7f2d02960700 2 osd.202 1145364 > > ms_handle_reset con 0x55a4b835f800 session 0x55a51c1b90c0 > > -1> 2021-08-20T15:01:03.225+0430 7f2d107d6700 10 monclient: > > handle_auth_request added challenge on 0x55a3c5e52000 > > 0> 2021-08-20T15:01:03.233+0430 7f2d0ffd5700 -1 *** Caught signal > > (Segmentation fault) ** > > in thread 7f2d0ffd5700 thread_name:msgr-worker-2 > > > > ceph version 15.2.12 (ce065eabfa5ce81323b009786bdf5bb03127cbe1) octopus > > (stable) > > 1: (()+0x12980) [0x7f2d144b0980] > > 2: (AsyncConnection::_stop()+0x9c) [0x55a37bf56cdc] > > 3: (ProtocolV2::stop()+0x8b) [0x55a37bf8016b] > > 4: (ProtocolV2::_fault()+0x6b) [0x55a37bf8030b] > > 5: > > > (ProtocolV2::handle_read_frame_preamble_main(std::unique_ptr<ceph::buffer::v15_2_0::ptr_node, > > ceph::buffer::v15_2_0::ptr_node::disposer>&&, int)+0x1d1) > [0x55a37bf97d51] > > 6: (ProtocolV2::run_continuation(Ct<ProtocolV2>&)+0x34) [0x55a37bf80e64] > > 7: (AsyncConnection::process()+0x5fc) [0x55a37bf59e0c] > > 8: (EventCenter::process_events(unsigned int, > > std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >*)+0x7dd) > > [0x55a37bda9a2d] > > 9: (()+0x11d45a8) [0x55a37bdaf5a8] > > 10: (()+0xbd6df) [0x7f2d13b886df] > > 11: (()+0x76db) [0x7f2d144a56db] > > 12: (clone()+0x3f) [0x7f2d1324571f] > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed > > to interpret this. > > > > Our cluster has 220 hdd disks and 200 ssds. We have separate nvme for DB > > use in hdd osds. bucket indexes have also separate ssd disks. > > Does anybody have any idea what the problem could be? > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx