On Tue, 7 Aug 2018, Willem Jan Withagen wrote: > Hi, > > On my test-cluster I had some problems, probably due to heartbeat > timeout problems that threw OSD out. > > But now I have this other problem, first crash was probably also a > suicide timeout, and the OSD does not want to restart: > > First lost of journal_replay, and then an assert. > What is smart to do, try to fix this (if possible) or trash the OSD (or > even the cluster) and go on with life? > > The first option would perhaps be rather educational. > > --WjW > > -8> 2018-08-07 12:20:15.919393 a1e6480 3 journal journal_replay: r > = 0, op_seq now 772232 > -7> 2018-08-07 12:20:15.919417 a1e6480 2 journal read_entry > 5191901184 : seq 772233 1182 bytes > -6> 2018-08-07 12:20:15.919423 a1e6480 3 journal journal_replay: > applying op seq 772233 > -5> 2018-08-07 12:20:15.919463 a1e6480 3 journal journal_replay: r > = 0, op_seq now 772233 > -4> 2018-08-07 12:20:15.919486 a1e6480 2 journal read_entry > 5191905280 : seq 772234 1131 bytes > -3> 2018-08-07 12:20:15.919491 a1e6480 3 journal journal_replay: > applying op seq 772234 > -2> 2018-08-07 12:20:15.919526 a1e6480 3 journal journal_replay: r > = 0, op_seq now 772234 > -1> 2018-08-07 12:20:19.807879 a1e6480 -1 journal > FileJournal::wrap_read_bl: safe_read_exact 5191913458~4196316 returned -5 I think if the journal read gets EIO that's a real EIO. Anything in your kernel log? I would blow away this OSD and move on... sage > 0> 2018-08-07 12:20:19.808465 a1e6480 -1 *** Caught signal (Abort > trap) ** > in thread a1e6480 thread_name: > > ceph version 12.2.4 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous > (stable) > 1: <install_standard_sighandlers(void)+0x417> at /usr/local/bin/ceph-osd > 2: <pthread_sigmask()+0x536> at /lib/libthr.so.3 > 3: <pthread_getspecific()+0xe12> at /lib/libthr.so.3 > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > needed to interpret this. > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html