Some OSD and MDS crash

joao.luis@xxxxxxxxxxx (Joao Eduardo Luis) · Thu, 03 Jul 2014 12:49:48 +0100

On 07/03/2014 12:15 AM, Pierre BLONDEAU wrote:
> Le 03/07/2014 00:55, Samuel Just a ?crit :
>> Ah,
>>
>> ~/logs ? for i in 20 23; do ../ceph/src/osdmaptool --export-crush
>> /tmp/crush$i osd-$i*; ../ceph/src/crushtool -d /tmp/crush$i >
>> /tmp/crush$i.d; done; diff /tmp/crush20.d /tmp/crush23.d
>> ../ceph/src/osdmaptool: osdmap file
>> 'osd-20_osdmap.13258__0_4E62BB79__none'
>> ../ceph/src/osdmaptool: exported crush map to /tmp/crush20
>> ../ceph/src/osdmaptool: osdmap file
>> 'osd-23_osdmap.13258__0_4E62BB79__none'
>> ../ceph/src/osdmaptool: exported crush map to /tmp/crush23
>> 6d5
>> < tunable chooseleaf_vary_r 1
>>
>>  Looks like the chooseleaf_vary_r tunable somehow ended up divergent?

The only thing that comes to mind that could cause this is if we changed 
the leader's in-memory map, proposed it, it failed, and only the leader 
got to write the map to disk somehow.  This happened once on a totally 
different issue (although I can't pinpoint right now which).

In such a scenario, the leader would serve the incorrect osdmap to 
whoever asked osdmaps from it, the remaining quorum would serve the 
correct osdmaps to all the others.  This could cause this divergence. 
Or it could be something else.

Are there logs for the monitors for the timeframe this may have happened in?

   -Joao

>>
>> Pierre: do you recall how and when that got set?
>
> I am not sure to understand, but if I good remember after the update in
> firefly, I was in state : HEALTH_WARN crush map has legacy tunables and
> I see "feature set mismatch" in log.
>
> So if I good remeber, i do : ceph osd crush tunables optimal for the
> problem of "crush map" and I update my client and server kernel to 3.16rc.
>
> It's could be that ?
>
> Pierre
>
>> -Sam
>>
>> On Wed, Jul 2, 2014 at 3:43 PM, Samuel Just <sam.just at inktank.com> wrote:
>>> Yeah, divergent osdmaps:
>>> 555ed048e73024687fc8b106a570db4f  osd-20_osdmap.13258__0_4E62BB79__none
>>> 6037911f31dc3c18b05499d24dcdbe5c  osd-23_osdmap.13258__0_4E62BB79__none
>>>
>>> Joao: thoughts?
>>> -Sam
>>>
>>> On Wed, Jul 2, 2014 at 3:39 PM, Pierre BLONDEAU
>>> <pierre.blondeau at unicaen.fr> wrote:
>>>> The files
>>>>
>>>> When I upgrade :
>>>>   ceph-deploy install --stable firefly servers...
>>>>   on each servers service ceph restart mon
>>>>   on each servers service ceph restart osd
>>>>   on each servers service ceph restart mds
>>>>
>>>> I upgraded from emperor to firefly. After repair, remap, replace,
>>>> etc ... I
>>>> have some PG which pass in peering state.
>>>>
>>>> I thought why not try the version 0.82, it could solve my problem. (
>>>> It's my mistake ). So, I upgrade from firefly to 0.83 with :
>>>>   ceph-deploy install --testing servers...
>>>>   ..
>>>>
>>>> Now, all programs are in version 0.82.
>>>> I have 3 mons, 36 OSD and 3 mds.
>>>>
>>>> Pierre
>>>>
>>>> PS : I find also "inc\uosdmap.13258__0_469271DE__none" on each meta
>>>> directory.
>>>>
>>>> Le 03/07/2014 00:10, Samuel Just a ?crit :
>>>>
>>>>> Also, what version did you upgrade from, and how did you upgrade?
>>>>> -Sam
>>>>>
>>>>> On Wed, Jul 2, 2014 at 3:09 PM, Samuel Just <sam.just at inktank.com>
>>>>> wrote:
>>>>>>
>>>>>> Ok, in current/meta on osd 20 and osd 23, please attach all files
>>>>>> matching
>>>>>>
>>>>>> ^osdmap.13258.*
>>>>>>
>>>>>> There should be one such file on each osd. (should look something
>>>>>> like
>>>>>> osdmap.6__0_FD6E4C01__none, probably hashed into a subdirectory,
>>>>>> you'll want to use find).
>>>>>>
>>>>>> What version of ceph is running on your mons?  How many mons do
>>>>>> you have?
>>>>>> -Sam
>>>>>>
>>>>>> On Wed, Jul 2, 2014 at 2:21 PM, Pierre BLONDEAU
>>>>>> <pierre.blondeau at unicaen.fr> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I do it, the log files are available here :
>>>>>>> https://blondeau.users.greyc.fr/cephlog/debug20/
>>>>>>>
>>>>>>> The OSD's files are really big +/- 80M .
>>>>>>>
>>>>>>> After starting the osd.20 some other osd crash. I pass from 31
>>>>>>> osd up to
>>>>>>> 16.
>>>>>>> I remark that after this the number of down+peering PG decrease
>>>>>>> from 367
>>>>>>> to
>>>>>>> 248. It's "normal" ? May be it's temporary, the time that the
>>>>>>> cluster
>>>>>>> verifies all the PG ?
>>>>>>>
>>>>>>> Regards
>>>>>>> Pierre
>>>>>>>
>>>>>>> Le 02/07/2014 19:16, Samuel Just a ?crit :
>>>>>>>
>>>>>>>> You should add
>>>>>>>>
>>>>>>>> debug osd = 20
>>>>>>>> debug filestore = 20
>>>>>>>> debug ms = 1
>>>>>>>>
>>>>>>>> to the [osd] section of the ceph.conf and restart the osds.  I'd
>>>>>>>> like
>>>>>>>> all three logs if possible.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> -Sam
>>>>>>>>
>>>>>>>> On Wed, Jul 2, 2014 at 5:03 AM, Pierre BLONDEAU
>>>>>>>> <pierre.blondeau at unicaen.fr> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yes, but how i do that ?
>>>>>>>>>
>>>>>>>>> With a command like that ?
>>>>>>>>>
>>>>>>>>> ceph tell osd.20 injectargs '--debug-osd 20 --debug-filestore 20
>>>>>>>>> --debug-ms
>>>>>>>>> 1'
>>>>>>>>>
>>>>>>>>> By modify the /etc/ceph/ceph.conf ? This file is really poor
>>>>>>>>> because I
>>>>>>>>> use
>>>>>>>>> udev detection.
>>>>>>>>>
>>>>>>>>> When I have made these changes, you want the three log files or
>>>>>>>>> only
>>>>>>>>> osd.20's ?
>>>>>>>>>
>>>>>>>>> Thank you so much for the help
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> Pierre
>>>>>>>>>
>>>>>>>>> Le 01/07/2014 23:51, Samuel Just a ?crit :
>>>>>>>>>
>>>>>>>>>> Can you reproduce with
>>>>>>>>>> debug osd = 20
>>>>>>>>>> debug filestore = 20
>>>>>>>>>> debug ms = 1
>>>>>>>>>> ?
>>>>>>>>>> -Sam
>>>>>>>>>>
>>>>>>>>>> On Tue, Jul 1, 2014 at 1:21 AM, Pierre BLONDEAU
>>>>>>>>>> <pierre.blondeau at unicaen.fr> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I join :
>>>>>>>>>>>      - osd.20 is one of osd that I detect which makes crash
>>>>>>>>>>> other
>>>>>>>>>>> OSD.
>>>>>>>>>>>      - osd.23 is one of osd which crash when i start osd.20
>>>>>>>>>>>      - mds, is one of my MDS
>>>>>>>>>>>
>>>>>>>>>>> I cut log file because they are to big but. All is here :
>>>>>>>>>>> https://blondeau.users.greyc.fr/cephlog/
>>>>>>>>>>>
>>>>>>>>>>> Regards
>>>>>>>>>>>
>>>>>>>>>>> Le 30/06/2014 17:35, Gregory Farnum a ?crit :
>>>>>>>>>>>
>>>>>>>>>>>> What's the backtrace from the crashing OSDs?
>>>>>>>>>>>>
>>>>>>>>>>>> Keep in mind that as a dev release, it's generally best not to
>>>>>>>>>>>> upgrade
>>>>>>>>>>>> to unnamed versions like 0.82 (but it's probably too late to go
>>>>>>>>>>>> back
>>>>>>>>>>>> now).
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I will remember it the next time ;)
>>>>>>>>>>>
>>>>>>>>>>>> -Greg
>>>>>>>>>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jun 30, 2014 at 8:06 AM, Pierre BLONDEAU
>>>>>>>>>>>> <pierre.blondeau at unicaen.fr> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> After the upgrade to firefly, I have some PG in peering state.
>>>>>>>>>>>>> I seen the output of 0.82 so I try to upgrade for solved my
>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>
>>>>>>>>>>>>> My three MDS crash and some OSD triggers a chain reaction that
>>>>>>>>>>>>> kills
>>>>>>>>>>>>> other
>>>>>>>>>>>>> OSD.
>>>>>>>>>>>>> I think my MDS will not start because of the metadata are
>>>>>>>>>>>>> on the
>>>>>>>>>>>>> OSD.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have 36 OSD on three servers and I identified 5 OSD which
>>>>>>>>>>>>> makes
>>>>>>>>>>>>> crash
>>>>>>>>>>>>> others. If i not start their, the cluster passe in
>>>>>>>>>>>>> reconstructive
>>>>>>>>>>>>> state
>>>>>>>>>>>>> with
>>>>>>>>>>>>> 31 OSD but i have 378 in down+peering state.
>>>>>>>>>>>>>
>>>>>>>>>>>>> How can I do ? Would you more information ( os, crash log,
>>>>>>>>>>>>> etc ...
>>>>>>>>>>>>> )
>>>>>>>>>>>>> ?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> ----------------------------------------------
>>>>>>> Pierre BLONDEAU
>>>>>>> Administrateur Syst?mes & r?seaux
>>>>>>> Universit? de Caen
>>>>>>> Laboratoire GREYC, D?partement d'informatique
>>>>>>>
>>>>>>> tel     : 02 31 56 75 42
>>>>>>> bureau  : Campus 2, Science 3, 406
>>>>>>> ----------------------------------------------
>>>>>>>
>>>>
>>>>
>>>> --
>>>> ----------------------------------------------
>>>> Pierre BLONDEAU
>>>> Administrateur Syst?mes & r?seaux
>>>> Universit? de Caen
>>>> Laboratoire GREYC, D?partement d'informatique
>>>>
>>>> tel     : 02 31 56 75 42
>>>> bureau  : Campus 2, Science 3, 406
>>>> ----------------------------------------------
>
>

-- 
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com