Re: unable to start OSD

Sage Weil <sage@xxxxxxxxxxx> · Wed, 12 Feb 2014 07:30:05 -0800 (PST)

Hi Dietmar,

This sounds like a bug introduced an entry into the pg log that is not 
ordered properly.  I don't think I've seen this before... Sam, have you?

How many OSDs you do you have?

Can you set 'debug osd = 20' in your ceph.conf, restart and reproduce the 
crash, and post the log somewhere?  (You can use 'ceph-post <filename>' to 
send it to us.)  I'm surprised that it is now happening on all daemons.

sage

On Wed, 12 Feb 2014, Dietmar Maurer wrote:

> After enabling debugging, I get:
> 
> ...
>    -4> 2014-02-12 09:43:44.739648 7f7f8b848780 20 read_log 6100'1677 (6100'1676) modify   85949a17/rbd_data.dd6592ae8944a.00000000000001bd/head//25 by clie
> nt.890681.0:76884 2014-01-26 16:44:08.412457
>     -3> 2014-02-12 09:43:44.739670 7f7f8b848780 20 read_log 6100'1678 (6100'1677) modify   85949a17/rbd_data.dd6592ae8944a.00000000000001bd/head//25 by clie
> nt.890681.0:76886 2014-01-26 16:44:08.433901
>     -2> 2014-02-12 09:43:44.739685 7f7f8b848780 20 read_log 6137'1679 (6100'1678) delete   85949a17/rbd_data.dd6592ae8944a.00000000000001bd/head//25 by clie
> nt.1006689.0:453 2014-01-28 13:06:15.757466
>     -1> 2014-02-12 09:43:44.739700 7f7f8b848780 20 read_log 10683'1 (0'0) modify   b32e0417/rbd_data.1bf7912ae8944a.000000000000010e/head//25 by client.1832
> 855.0:764 2014-02-11 19:02:15.914885
>      0> 2014-02-12 09:43:44.742164 7f7f8b848780 -1 osd/PGLog.cc: In function 'static bool PGLog::read_log(ObjectStore*, coll_t, hobject_t, const pg_info_t&,
>  std::map<eversion_t, hobject_t>&, PGLog::IndexedLog&, pg_missing_t&, std::ostringstream&, std::set<std::basic_string<char> >*)' thread 7f7f8b848780 time 20
> 14-02-12 09:43:44.739712
> osd/PGLog.cc: 672: FAILED assert(last_e.version.version < e.version.version)
> 
> 
> 
> > > I am unable to start my OSDs on one node:
> > >
> > > > osd/PGLog.cc: 672: FAILED assert(last_e.version.version <
> > > > e.version.version)
> > >
> > > Does that mean there is something wrong with my journal disk? Or why
> > > can such thing happen?
> > 
> > After rebooting other nodes, all my OSD are offline, showing exactly the same
> > error.
> > 
> > And ideas?
> > 
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com