I've yet to make the core match the binary. On Jun 22, 2012, at 11:32 PM, Stefan Priebe <s.priebe@xxxxxxxxxxxx> wrote: > Thanks did you find anything? > > Am 23.06.2012 um 01:59 schrieb Sam Just <sam.just@xxxxxxxxxxx>: > >> I am still looking into the logs. >> -Sam >> >> On Fri, Jun 22, 2012 at 3:56 PM, Dan Mick <dan.mick@xxxxxxxxxxx> wrote: >>> Stefan, I'm looking at your logs and coredump now. >>> >>> >>> On 06/21/2012 11:43 PM, Stefan Priebe wrote: >>>> >>>> Does anybody have an idea? This is right now a showstopper to me. >>>> >>>> Am 21.06.2012 um 14:55 schrieb Stefan Priebe - Profihost >>>> AG<s.priebe@xxxxxxxxxxxx>: >>>> >>>>> Hello list, >>>>> >>>>> i'm able to reproducably crash osd daemons. >>>>> >>>>> How i can reproduce: >>>>> >>>>> Kernel: 3.5.0-rc3 >>>>> Ceph: 0.47.3 >>>>> FS: btrfs >>>>> Journal: 2GB tmpfs per OSD >>>>> OSD: 3x servers with 4x Intel SSD OSDs each >>>>> 10GBE Network >>>>> rbd_cache_max_age: 2.0 >>>>> rbd_cache_size: 33554432 >>>>> >>>>> Disk is set to writeback. >>>>> >>>>> Start a KVM VM via PXE with the disk attached in writeback mode. >>>>> >>>>> Then run randwrite stress more than 2 time. Mostly OSD 22 in my case >>>>> crashes. >>>>> >>>>> # fio --filename=/dev/vda1 --direct=1 --rw=randwrite --bs=4k --size=200G >>>>> --numjobs=50 --runtime=90 --group_reporting --name=file1; fio >>>>> --filename=/dev/vda1 --direct=1 --rw=randwrite --bs=4k --size=200G >>>>> --numjobs=50 --runtime=90 --group_reporting --name=file1; fio >>>>> --filename=/dev/vda1 --direct=1 --rw=randwrite --bs=4k --size=200G >>>>> --numjobs=50 --runtime=90 --group_reporting --name=file1; halt >>>>> >>>>> Strangely exactly THIS OSD also has the most log entries: >>>>> 64K ceph-osd.20.log >>>>> 64K ceph-osd.21.log >>>>> 1,3M ceph-osd.22.log >>>>> 64K ceph-osd.23.log >>>>> >>>>> But all OSDs are set to debug osd = 20. >>>>> >>>>> dmesg shows: >>>>> ceph-osd[5381]: segfault at 3f592c000 ip 00007fa281d8eb23 sp >>>>> 00007fa27702d260 error 4 in libtcmalloc.so.0.0.0[7fa281d6a000+3d000] >>>>> >>>>> I uploaded the following files: >>>>> priebe_fio_randwrite_ceph-osd.21.log.bz2 => OSD which was OK and didn't >>>>> crash >>>>> priebe_fio_randwrite_ceph-osd.22.log.bz2 => Log from the crashed OSD >>>>> üu >>>>> priebe_fio_randwrite_core.ssdstor001.27204.bz2 => Core dump >>>>> priebe_fio_randwrite_ceph-osd.bz2 => osd binary >>>>> >>>>> Stefan >>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html