Some OSD and MDS crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I do it, the log files are available here : 
https://blondeau.users.greyc.fr/cephlog/debug20/

The OSD's files are really big +/- 80M .

After starting the osd.20 some other osd crash. I pass from 31 osd up to 
16. I remark that after this the number of down+peering PG decrease from 
367 to 248. It's "normal" ? May be it's temporary, the time that the 
cluster verifies all the PG ?

Regards
Pierre

Le 02/07/2014 19:16, Samuel Just a ?crit :
> You should add
>
> debug osd = 20
> debug filestore = 20
> debug ms = 1
>
> to the [osd] section of the ceph.conf and restart the osds.  I'd like
> all three logs if possible.
>
> Thanks
> -Sam
>
> On Wed, Jul 2, 2014 at 5:03 AM, Pierre BLONDEAU
> <pierre.blondeau at unicaen.fr> wrote:
>> Yes, but how i do that ?
>>
>> With a command like that ?
>>
>> ceph tell osd.20 injectargs '--debug-osd 20 --debug-filestore 20 --debug-ms
>> 1'
>>
>> By modify the /etc/ceph/ceph.conf ? This file is really poor because I use
>> udev detection.
>>
>> When I have made these changes, you want the three log files or only
>> osd.20's ?
>>
>> Thank you so much for the help
>>
>> Regards
>> Pierre
>>
>> Le 01/07/2014 23:51, Samuel Just a ?crit :
>>
>>> Can you reproduce with
>>> debug osd = 20
>>> debug filestore = 20
>>> debug ms = 1
>>> ?
>>> -Sam
>>>
>>> On Tue, Jul 1, 2014 at 1:21 AM, Pierre BLONDEAU
>>> <pierre.blondeau at unicaen.fr> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I join :
>>>>    - osd.20 is one of osd that I detect which makes crash other OSD.
>>>>    - osd.23 is one of osd which crash when i start osd.20
>>>>    - mds, is one of my MDS
>>>>
>>>> I cut log file because they are to big but. All is here :
>>>> https://blondeau.users.greyc.fr/cephlog/
>>>>
>>>> Regards
>>>>
>>>> Le 30/06/2014 17:35, Gregory Farnum a ?crit :
>>>>
>>>>> What's the backtrace from the crashing OSDs?
>>>>>
>>>>> Keep in mind that as a dev release, it's generally best not to upgrade
>>>>> to unnamed versions like 0.82 (but it's probably too late to go back
>>>>> now).
>>>>
>>>> I will remember it the next time ;)
>>>>
>>>>> -Greg
>>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>>>>
>>>>> On Mon, Jun 30, 2014 at 8:06 AM, Pierre BLONDEAU
>>>>> <pierre.blondeau at unicaen.fr> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> After the upgrade to firefly, I have some PG in peering state.
>>>>>> I seen the output of 0.82 so I try to upgrade for solved my problem.
>>>>>>
>>>>>> My three MDS crash and some OSD triggers a chain reaction that kills
>>>>>> other
>>>>>> OSD.
>>>>>> I think my MDS will not start because of the metadata are on the OSD.
>>>>>>
>>>>>> I have 36 OSD on three servers and I identified 5 OSD which makes crash
>>>>>> others. If i not start their, the cluster passe in reconstructive state
>>>>>> with
>>>>>> 31 OSD but i have 378 in down+peering state.
>>>>>>
>>>>>> How can I do ? Would you more information ( os, crash log, etc ... ) ?
>>>>>>
>>>>>> Regards

-- 
----------------------------------------------
Pierre BLONDEAU
Administrateur Syst?mes & r?seaux
Universit? de Caen
Laboratoire GREYC, D?partement d'informatique

tel	: 02 31 56 75 42
bureau	: Campus 2, Science 3, 406
----------------------------------------------

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2947 bytes
Desc: Signature cryptographique S/MIME
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140702/9d13faad/attachment.bin>


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux