Re: reproducable osd crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks did you find anything?

Am 23.06.2012 um 01:59 schrieb Sam Just <sam.just@xxxxxxxxxxx>:

> I am still looking into the logs.
> -Sam
> 
> On Fri, Jun 22, 2012 at 3:56 PM, Dan Mick <dan.mick@xxxxxxxxxxx> wrote:
>> Stefan, I'm looking at your logs and coredump now.
>> 
>> 
>> On 06/21/2012 11:43 PM, Stefan Priebe wrote:
>>> 
>>> Does anybody have an idea? This is right now a showstopper to me.
>>> 
>>> Am 21.06.2012 um 14:55 schrieb Stefan Priebe - Profihost
>>> AG<s.priebe@xxxxxxxxxxxx>:
>>> 
>>>> Hello list,
>>>> 
>>>> i'm able to reproducably crash osd daemons.
>>>> 
>>>> How i can reproduce:
>>>> 
>>>> Kernel: 3.5.0-rc3
>>>> Ceph: 0.47.3
>>>> FS: btrfs
>>>> Journal: 2GB tmpfs per OSD
>>>> OSD: 3x servers with 4x Intel SSD OSDs each
>>>> 10GBE Network
>>>> rbd_cache_max_age: 2.0
>>>> rbd_cache_size: 33554432
>>>> 
>>>> Disk is set to writeback.
>>>> 
>>>> Start a KVM VM via PXE with the disk attached in writeback mode.
>>>> 
>>>> Then run randwrite stress more than 2 time. Mostly OSD 22 in my case
>>>> crashes.
>>>> 
>>>> # fio --filename=/dev/vda1 --direct=1 --rw=randwrite --bs=4k --size=200G
>>>> --numjobs=50 --runtime=90 --group_reporting --name=file1; fio
>>>> --filename=/dev/vda1 --direct=1 --rw=randwrite --bs=4k --size=200G
>>>> --numjobs=50 --runtime=90 --group_reporting --name=file1; fio
>>>> --filename=/dev/vda1 --direct=1 --rw=randwrite --bs=4k --size=200G
>>>> --numjobs=50 --runtime=90 --group_reporting --name=file1; halt
>>>> 
>>>> Strangely exactly THIS OSD also has the most log entries:
>>>> 64K     ceph-osd.20.log
>>>> 64K     ceph-osd.21.log
>>>> 1,3M    ceph-osd.22.log
>>>> 64K     ceph-osd.23.log
>>>> 
>>>> But all OSDs are set to debug osd = 20.
>>>> 
>>>> dmesg shows:
>>>> ceph-osd[5381]: segfault at 3f592c000 ip 00007fa281d8eb23 sp
>>>> 00007fa27702d260 error 4 in libtcmalloc.so.0.0.0[7fa281d6a000+3d000]
>>>> 
>>>> I uploaded the following files:
>>>> priebe_fio_randwrite_ceph-osd.21.log.bz2 =>  OSD which was OK and didn't
>>>> crash
>>>> priebe_fio_randwrite_ceph-osd.22.log.bz2 =>  Log from the crashed OSD
>>>> üu
>>>> priebe_fio_randwrite_core.ssdstor001.27204.bz2 =>  Core dump
>>>> priebe_fio_randwrite_ceph-osd.bz2 =>  osd binary
>>>> 
>>>> Stefan
>>> 
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux