Re: Empty osd and crushmap after mon restart?

Wido den Hollander <wido@xxxxxxxx> · Wed, 26 Jun 2013 09:24:28 +0200

On 06/26/2013 01:18 AM, Gregory Farnum wrote:
Some guesses are inline.

On Tue, Jun 25, 2013 at 4:06 PM, Wido den Hollander <wido@xxxxxxxx> wrote:
Hi,

I'm not sure what happened, but on a Ceph cluster I noticed that the
monitors (running 0.61) started filling up the disks, so they were restarted
with:

mon compact on start = true

After a restart the osdmap was empty, it showed:

    osdmap e2: 0 osds: 0 up, 0 in
     pgmap v624077: 15296 pgs: 15296 stale+active+clean; 78104 MB data, 243
GB used, 66789 GB / 67032 GB avail
    mdsmap e1: 0/0/1 up

This cluster has 36 OSDs over 9 hosts, but suddenly that was all gone.

I also checked the crushmap, all 36 OSDs were removed, no trace of them.

As you guess, this is probably because the disks filled up. It
shouldn't be able to happen but we found an edge case where leveldb
falls apart; there's a fix for it in the repository now (asserting
that we get back what we just wrote) that Sage can talk more about.
Probably both disappeared because the monitor got nothing back when
reading in the newest OSD Map, and so it's all empty.

Sounds reasonable and logical.

"ceph auth list" still showed their keys though.

Restarting the OSDs didn't help, since create-or-move complained that the
OSDs didn't exist and didn't do anything. I ran "ceph osd create" to get the
36 OSDs created again, but when the OSDs boot they never start working.

The only thing they log is:

2013-06-26 01:00:08.852410 7f17f3f16700  0 -- 0.0.0.0:6801/4767 >>
10.23.24.53:6801/1758 pipe(0x1025fc80 sd=116 :40516 s=1 pgs=0 cs=0
l=0).fault with nothing to send, going to standby

Are they going up and just sitting idle? This is probably because none
of their peers are telling them to be responsible for any placement
groups on startup.

No, they never come up. So checking the monitor logs I only see the 
create-or-move command changing their crush position, but they never 
mark themselves as "up", so all the OSDs stay down.

netstat however shows a connection with the monitor between the OSD and 
the Mon, but nothing special in the logs at lower debugging.

The internet connection I'm behind is a 3G connection, so I can't go
skimming through the logs with debugging at very high levels, but I'm just
wondering what this could be?

It's obvious that the monitors filling up probably triggered the problem,
but I'm now looking at a way to get the OSDs back up again.

In the meantime I upgraded all the nodes to 0.61.4, but that didn't change
anything.

Any ideas on what this might be and how to resolve it?

At a guess, you can go in and grab the last good version of the OSD
Map and inject that back into the cluster, then restart the OSDs? If
that doesn't work then we'll need to figure out the right way to kick
them into being responsible for their stuff.
(First, make sure that when you turn them on they are actually
connecting to the monitors.)

You mean grabbing the old OSDMap from an OSD or the Monitor store? Both 
are using leveldb for their storage now, right? So I'd have to grab the 
OSD Map using some leveldb tooling?

Wido

-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com

--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com