Some OSD and MDS crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Also, what version did you upgrade from, and how did you upgrade?
-Sam

On Wed, Jul 2, 2014 at 3:09 PM, Samuel Just <sam.just at inktank.com> wrote:
> Ok, in current/meta on osd 20 and osd 23, please attach all files matching
>
> ^osdmap.13258.*
>
> There should be one such file on each osd. (should look something like
> osdmap.6__0_FD6E4C01__none, probably hashed into a subdirectory,
> you'll want to use find).
>
> What version of ceph is running on your mons?  How many mons do you have?
> -Sam
>
> On Wed, Jul 2, 2014 at 2:21 PM, Pierre BLONDEAU
> <pierre.blondeau at unicaen.fr> wrote:
>> Hi,
>>
>> I do it, the log files are available here :
>> https://blondeau.users.greyc.fr/cephlog/debug20/
>>
>> The OSD's files are really big +/- 80M .
>>
>> After starting the osd.20 some other osd crash. I pass from 31 osd up to 16.
>> I remark that after this the number of down+peering PG decrease from 367 to
>> 248. It's "normal" ? May be it's temporary, the time that the cluster
>> verifies all the PG ?
>>
>> Regards
>> Pierre
>>
>> Le 02/07/2014 19:16, Samuel Just a ?crit :
>>
>>> You should add
>>>
>>> debug osd = 20
>>> debug filestore = 20
>>> debug ms = 1
>>>
>>> to the [osd] section of the ceph.conf and restart the osds.  I'd like
>>> all three logs if possible.
>>>
>>> Thanks
>>> -Sam
>>>
>>> On Wed, Jul 2, 2014 at 5:03 AM, Pierre BLONDEAU
>>> <pierre.blondeau at unicaen.fr> wrote:
>>>>
>>>> Yes, but how i do that ?
>>>>
>>>> With a command like that ?
>>>>
>>>> ceph tell osd.20 injectargs '--debug-osd 20 --debug-filestore 20
>>>> --debug-ms
>>>> 1'
>>>>
>>>> By modify the /etc/ceph/ceph.conf ? This file is really poor because I
>>>> use
>>>> udev detection.
>>>>
>>>> When I have made these changes, you want the three log files or only
>>>> osd.20's ?
>>>>
>>>> Thank you so much for the help
>>>>
>>>> Regards
>>>> Pierre
>>>>
>>>> Le 01/07/2014 23:51, Samuel Just a ?crit :
>>>>
>>>>> Can you reproduce with
>>>>> debug osd = 20
>>>>> debug filestore = 20
>>>>> debug ms = 1
>>>>> ?
>>>>> -Sam
>>>>>
>>>>> On Tue, Jul 1, 2014 at 1:21 AM, Pierre BLONDEAU
>>>>> <pierre.blondeau at unicaen.fr> wrote:
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I join :
>>>>>>    - osd.20 is one of osd that I detect which makes crash other OSD.
>>>>>>    - osd.23 is one of osd which crash when i start osd.20
>>>>>>    - mds, is one of my MDS
>>>>>>
>>>>>> I cut log file because they are to big but. All is here :
>>>>>> https://blondeau.users.greyc.fr/cephlog/
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>> Le 30/06/2014 17:35, Gregory Farnum a ?crit :
>>>>>>
>>>>>>> What's the backtrace from the crashing OSDs?
>>>>>>>
>>>>>>> Keep in mind that as a dev release, it's generally best not to upgrade
>>>>>>> to unnamed versions like 0.82 (but it's probably too late to go back
>>>>>>> now).
>>>>>>
>>>>>>
>>>>>> I will remember it the next time ;)
>>>>>>
>>>>>>> -Greg
>>>>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>>>>>>
>>>>>>> On Mon, Jun 30, 2014 at 8:06 AM, Pierre BLONDEAU
>>>>>>> <pierre.blondeau at unicaen.fr> wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> After the upgrade to firefly, I have some PG in peering state.
>>>>>>>> I seen the output of 0.82 so I try to upgrade for solved my problem.
>>>>>>>>
>>>>>>>> My three MDS crash and some OSD triggers a chain reaction that kills
>>>>>>>> other
>>>>>>>> OSD.
>>>>>>>> I think my MDS will not start because of the metadata are on the OSD.
>>>>>>>>
>>>>>>>> I have 36 OSD on three servers and I identified 5 OSD which makes
>>>>>>>> crash
>>>>>>>> others. If i not start their, the cluster passe in reconstructive
>>>>>>>> state
>>>>>>>> with
>>>>>>>> 31 OSD but i have 378 in down+peering state.
>>>>>>>>
>>>>>>>> How can I do ? Would you more information ( os, crash log, etc ... )
>>>>>>>> ?
>>>>>>>>
>>>>>>>> Regards
>>
>>
>> --
>> ----------------------------------------------
>> Pierre BLONDEAU
>> Administrateur Syst?mes & r?seaux
>> Universit? de Caen
>> Laboratoire GREYC, D?partement d'informatique
>>
>> tel     : 02 31 56 75 42
>> bureau  : Campus 2, Science 3, 406
>> ----------------------------------------------
>>


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux