Re: Empty osd and crushmap after mon restart?

Gregory Farnum <greg@xxxxxxxxxxx> · Wed, 26 Jun 2013 09:54:23 -0700

On Wed, Jun 26, 2013 at 12:24 AM, Wido den Hollander <wido@xxxxxxxx> wrote:
> On 06/26/2013 01:18 AM, Gregory Farnum wrote:
>>
>> Some guesses are inline.
>>
>> On Tue, Jun 25, 2013 at 4:06 PM, Wido den Hollander <wido@xxxxxxxx> wrote:
>>>
>>> Hi,
>>>
>>> I'm not sure what happened, but on a Ceph cluster I noticed that the
>>> monitors (running 0.61) started filling up the disks, so they were
>>> restarted
>>> with:
>>>
>>> mon compact on start = true
>>>
>>> After a restart the osdmap was empty, it showed:
>>>
>>>     osdmap e2: 0 osds: 0 up, 0 in
>>>      pgmap v624077: 15296 pgs: 15296 stale+active+clean; 78104 MB data,
>>> 243
>>> GB used, 66789 GB / 67032 GB avail
>>>     mdsmap e1: 0/0/1 up
>>>
>>> This cluster has 36 OSDs over 9 hosts, but suddenly that was all gone.
>>>
>>> I also checked the crushmap, all 36 OSDs were removed, no trace of them.
>>
>>
>> As you guess, this is probably because the disks filled up. It
>> shouldn't be able to happen but we found an edge case where leveldb
>> falls apart; there's a fix for it in the repository now (asserting
>> that we get back what we just wrote) that Sage can talk more about.
>> Probably both disappeared because the monitor got nothing back when
>> reading in the newest OSD Map, and so it's all empty.
>>
>
> Sounds reasonable and logical.
>
>
>>> "ceph auth list" still showed their keys though.
>>>
>>> Restarting the OSDs didn't help, since create-or-move complained that the
>>> OSDs didn't exist and didn't do anything. I ran "ceph osd create" to get
>>> the
>>> 36 OSDs created again, but when the OSDs boot they never start working.
>>>
>>> The only thing they log is:
>>>
>>> 2013-06-26 01:00:08.852410 7f17f3f16700  0 -- 0.0.0.0:6801/4767 >>
>>> 10.23.24.53:6801/1758 pipe(0x1025fc80 sd=116 :40516 s=1 pgs=0 cs=0
>>> l=0).fault with nothing to send, going to standby
>>
>>
>> Are they going up and just sitting idle? This is probably because none
>> of their peers are telling them to be responsible for any placement
>> groups on startup.
>>
>
> No, they never come up. So checking the monitor logs I only see the
> create-or-move command changing their crush position, but they never mark
> themselves as "up", so all the OSDs stay down.
>
> netstat however shows a connection with the monitor between the OSD and the
> Mon, but nothing special in the logs at lower debugging.

So the process is still running? Can you generate full logs with debug
ms = 5, debug osd = 20, debug monc = 20?

>>> The internet connection I'm behind is a 3G connection, so I can't go
>>> skimming through the logs with debugging at very high levels, but I'm
>>> just
>>> wondering what this could be?
>>>
>>> It's obvious that the monitors filling up probably triggered the problem,
>>> but I'm now looking at a way to get the OSDs back up again.
>>>
>>> In the meantime I upgraded all the nodes to 0.61.4, but that didn't
>>> change
>>> anything.
>>>
>>> Any ideas on what this might be and how to resolve it?
>>
>>
>> At a guess, you can go in and grab the last good version of the OSD
>> Map and inject that back into the cluster, then restart the OSDs? If
>> that doesn't work then we'll need to figure out the right way to kick
>> them into being responsible for their stuff.
>> (First, make sure that when you turn them on they are actually
>> connecting to the monitors.)
>
>
> You mean grabbing the old OSDMap from an OSD or the Monitor store? Both are
> using leveldb for their storage now, right? So I'd have to grab the OSD Map
> using some leveldb tooling?

There's a ceph-monstore-tool or similar that provides this
functionality, although it's pretty new so you might need to grab an
autobuilt package somewhere instead of the cuttlefish one (not sure)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com