Re: Problem after ceph-osd crash

Oliver Francke <Oliver.Francke@xxxxxxxx> · Mon, 20 Feb 2012 18:49:19 +0100

Hi Sage,

On 02/20/2012 06:41 PM, Sage Weil wrote:
On Mon, 20 Feb 2012, Oliver Francke wrote:
Hi,

we are just in trouble after some mess with trying to include a new OSD-node
into our cluster.

We get some weird "libceph: corrupt inc osdmap epoch 880 off 102
(ffffc9001db8990a of ffffc9001db898a4-ffffc9001db89dae)"

on the console.
The whole system is in a state ala:

012-02-20 17:56:27.585295    pg v942504: 2046 pgs: 1348 active+clean, 43
active+recovering+degraded+remapped+backfill, 218 active+recovering, 437
active+recovering+remapped+backfill; 1950 GB data, 3734 GB used, 26059 GB /
29794 GB avail; 272914/1349073 degraded (20.230%)

and sometimes the ceph-osd on node0 is crashing. At the moment of writing, the
degrading continues to shrink down below 20%.
How did ceph-osd crash?  Is there a dump in the log?

'course I will provide all logs, uhm, a bit later, we are busy to start 
all VM's, and handle first customer-tickets right now ;-)

To be most complete for the collection, would you be so kind to give a 
list of all necessary kern.log osdX.log etc.?

Thnx for the fast reaction,

Oliver.

sage

Any clues?

Thnx in @vance,

Oliver.

--

Oliver Francke

filoo GmbH
Moltkestraße 25a
33330 Gütersloh
HRB4355 AG Gütersloh

Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz

Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--

Oliver Francke

filoo GmbH
Moltkestraße 25a
33330 Gütersloh
HRB4355 AG Gütersloh

Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz

Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html