Re: Consistent crashes while trying to recover OSD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 7 Aug 2018, Willem Jan Withagen wrote:
> Hi,
> 
> On my test-cluster I had some problems, probably due to heartbeat
> timeout problems that threw OSD out.
> 
> But now I have this other problem, first crash was probably also a
> suicide timeout, and the OSD does not want to restart:
> 
> First lost of journal_replay, and then an assert.
> What is smart to do, try to fix this (if possible) or trash the OSD (or
> even the cluster) and go on with life?
> 
> The first option would perhaps be rather educational.
> 
> --WjW
> 
>     -8> 2018-08-07 12:20:15.919393 a1e6480  3 journal journal_replay: r
> = 0, op_seq now 772232
>     -7> 2018-08-07 12:20:15.919417 a1e6480  2 journal read_entry
> 5191901184 : seq 772233 1182 bytes
>     -6> 2018-08-07 12:20:15.919423 a1e6480  3 journal journal_replay:
> applying op seq 772233
>     -5> 2018-08-07 12:20:15.919463 a1e6480  3 journal journal_replay: r
> = 0, op_seq now 772233
>     -4> 2018-08-07 12:20:15.919486 a1e6480  2 journal read_entry
> 5191905280 : seq 772234 1131 bytes
>     -3> 2018-08-07 12:20:15.919491 a1e6480  3 journal journal_replay:
> applying op seq 772234
>     -2> 2018-08-07 12:20:15.919526 a1e6480  3 journal journal_replay: r
> = 0, op_seq now 772234
>     -1> 2018-08-07 12:20:19.807879 a1e6480 -1 journal
> FileJournal::wrap_read_bl: safe_read_exact 5191913458~4196316 returned -5

I think if the journal read gets EIO that's a real EIO.  Anything in your 
kernel log?

I would blow away this OSD and move on...

sage

>      0> 2018-08-07 12:20:19.808465 a1e6480 -1 *** Caught signal (Abort
> trap) **
>  in thread a1e6480 thread_name:
> 
>  ceph version 12.2.4 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous
> (stable)
>  1: <install_standard_sighandlers(void)+0x417> at /usr/local/bin/ceph-osd
>  2: <pthread_sigmask()+0x536> at /lib/libthr.so.3
>  3: <pthread_getspecific()+0xe12> at /lib/libthr.so.3
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux