Re: Empty osd and crushmap after mon restart?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 06/26/2013 10:37 PM, Wido den Hollander wrote:
On 06/26/2013 06:54 PM, Gregory Farnum wrote:
On Wed, Jun 26, 2013 at 12:24 AM, Wido den Hollander <wido@xxxxxxxx>
wrote:
On 06/26/2013 01:18 AM, Gregory Farnum wrote:

Some guesses are inline.

On Tue, Jun 25, 2013 at 4:06 PM, Wido den Hollander <wido@xxxxxxxx>
wrote:

Hi,

I'm not sure what happened, but on a Ceph cluster I noticed that the
monitors (running 0.61) started filling up the disks, so they were
restarted
with:

mon compact on start = true

After a restart the osdmap was empty, it showed:

     osdmap e2: 0 osds: 0 up, 0 in
      pgmap v624077: 15296 pgs: 15296 stale+active+clean; 78104 MB
data,
243
GB used, 66789 GB / 67032 GB avail
     mdsmap e1: 0/0/1 up

This cluster has 36 OSDs over 9 hosts, but suddenly that was all gone.

I also checked the crushmap, all 36 OSDs were removed, no trace of
them.


As you guess, this is probably because the disks filled up. It
shouldn't be able to happen but we found an edge case where leveldb
falls apart; there's a fix for it in the repository now (asserting
that we get back what we just wrote) that Sage can talk more about.
Probably both disappeared because the monitor got nothing back when
reading in the newest OSD Map, and so it's all empty.


Sounds reasonable and logical.


"ceph auth list" still showed their keys though.

Restarting the OSDs didn't help, since create-or-move complained
that the
OSDs didn't exist and didn't do anything. I ran "ceph osd create"
to get
the
36 OSDs created again, but when the OSDs boot they never start
working.

The only thing they log is:

2013-06-26 01:00:08.852410 7f17f3f16700  0 -- 0.0.0.0:6801/4767 >>
10.23.24.53:6801/1758 pipe(0x1025fc80 sd=116 :40516 s=1 pgs=0 cs=0
l=0).fault with nothing to send, going to standby


Are they going up and just sitting idle? This is probably because none
of their peers are telling them to be responsible for any placement
groups on startup.


No, they never come up. So checking the monitor logs I only see the
create-or-move command changing their crush position, but they never
mark
themselves as "up", so all the OSDs stay down.

netstat however shows a connection with the monitor between the OSD
and the
Mon, but nothing special in the logs at lower debugging.

So the process is still running? Can you generate full logs with debug
ms = 5, debug osd = 20, debug monc = 20?


I've done so with 4 OSDs and I uploaded the logs of one OSD:

root@data1:~# sftp cephdrop@xxxxxxxx
cephdrop@xxxxxxxx's password:
Connected to ceph.com.
sftp> put ceph-osd-0-widodh-empty-osdmap.log.gz
Uploading ceph-osd-0-widodh-empty-osdmap.log.gz to
/home/cephdrop/ceph-osd-0-widodh-empty-osdmap.log.gz
ceph-osd-0-widodh-empty-osdmap.log.gz
100%   14MB   3.5MB/s   00:04
sftp>

My internet here is to slow to go through the logs and I haven't checked
them yet.


Couldn't resist going through them and I found this:

2013-06-26 22:21:11.721185 7ffe4b7c9780 7 osd.0 1137 consume_map version 1137 2013-06-26 22:21:11.746395 7ffe4b7c9780 10 osd.0 1137 done with init, starting boot process 2013-06-26 22:21:11.746400 7ffe4b7c9780 10 osd.0 1137 start_boot - have maps 503..1137
2013-06-26 22:21:11.746402 7ffe4b7c9780 10 monclient: get_version osdmap
2013-06-26 22:21:11.746404 7ffe4b7c9780 10 monclient: _send_mon_message to mon.mon1 at 10.23.24.8:6789/0 2013-06-26 22:21:11.746409 7ffe4b7c9780 1 -- 10.23.24.51:6800/27568 --> 10.23.24.8:6789/0 -- mon_get_version(what=osdmap handle=1) v1 -- ?+0 0x2bc5c40 con 0x4a28b00 2013-06-26 22:21:11.767132 7ffe3e43e700 1 -- 10.23.24.51:6800/27568 <== mon.0 10.23.24.8:6789/0 10 ==== mon_check_map_ack(handle=1 version=59) v2 ==== 24+0+0 (1392806332 0 0) 0x6b9f8c0 con 0x4a28b00 2013-06-26 22:21:11.771242 7ffe3a436700 10 osd.0 1137 _maybe_boot mon has osdmaps 1..59
2013-06-26 22:21:11.771259 7ffe3a436700 10 osd.0 1137 _send_boot
2013-06-26 22:21:11.771261 7ffe3a436700 10 osd.0 1137 assuming cluster_addr ip matches client_addr 2013-06-26 22:21:11.771262 7ffe3a436700 10 osd.0 1137 assuming hb_addr ip matches cluster_addr 2013-06-26 22:21:11.771265 7ffe3a436700 10 osd.0 1137 client_addr 10.23.24.51:6800/27568, cluster_addr 10.23.24.51:6801/27568, hb addr 10.23.24.51:6802/27568 2013-06-26 22:21:11.771274 7ffe3a436700 10 monclient: _send_mon_message to mon.mon1 at 10.23.24.8:6789/0 2013-06-26 22:21:11.771276 7ffe3a436700 1 -- 10.23.24.51:6800/27568 --> 10.23.24.8:6789/0 -- osd_boot(osd.0 booted 0 v1137) v3 -- ?+0 0x43d3000 con 0x4a28b00
2013-06-26 22:21:14.712300 7ffe3ac37700 10 monclient: tick
2013-06-26 22:21:14.712320 7ffe3ac37700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2013-06-26 22:20:44.712319) 2013-06-26 22:21:14.712327 7ffe3ac37700 10 monclient: renew subs? (now: 2013-06-26 22:21:14.712327; renew after: 2013-06-26 22:23:41.712450) -- no 2013-06-26 22:21:16.413529 7ffe3442a700 20 osd.0 1137 update_osd_stat osd_stat(6673 MB used, 1855 GB avail, 1862 GB total, peers []/[]) 2013-06-26 22:21:16.413542 7ffe3442a700 5 osd.0 1137 heartbeat: osd_stat(6673 MB used, 1855 GB avail, 1862 GB total, peers []/[])


So the OSD has osdmap 503 until 1137 and the monitor has 1 until 59.

Is there a way to fetch the osdmaps out of the OSD and inject them in the monitors datastore? The OSD uses leveldb as well for these maps doesn't it?

The internet connection I'm behind is a 3G connection, so I can't go
skimming through the logs with debugging at very high levels, but I'm
just
wondering what this could be?

It's obvious that the monitors filling up probably triggered the
problem,
but I'm now looking at a way to get the OSDs back up again.

In the meantime I upgraded all the nodes to 0.61.4, but that didn't
change
anything.

Any ideas on what this might be and how to resolve it?


At a guess, you can go in and grab the last good version of the OSD
Map and inject that back into the cluster, then restart the OSDs? If
that doesn't work then we'll need to figure out the right way to kick
them into being responsible for their stuff.
(First, make sure that when you turn them on they are actually
connecting to the monitors.)


You mean grabbing the old OSDMap from an OSD or the Monitor store?
Both are
using leveldb for their storage now, right? So I'd have to grab the
OSD Map
using some leveldb tooling?

There's a ceph-monstore-tool or similar that provides this
functionality, although it's pretty new so you might need to grab an
autobuilt package somewhere instead of the cuttlefish one (not sure)

Ah, cool! I'll give that a try.

-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com





--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux