When i start now the OSD again it seems to hang for forever. Load goes up to 200 and I/O Waits rise vom 0% to 20%.
Am 21.06.2012 14:55, schrieb Stefan Priebe - Profihost AG:
Hello list, i'm able to reproducably crash osd daemons. How i can reproduce: Kernel: 3.5.0-rc3 Ceph: 0.47.3 FS: btrfs Journal: 2GB tmpfs per OSD OSD: 3x servers with 4x Intel SSD OSDs each 10GBE Network rbd_cache_max_age: 2.0 rbd_cache_size: 33554432 Disk is set to writeback. Start a KVM VM via PXE with the disk attached in writeback mode. Then run randwrite stress more than 2 time. Mostly OSD 22 in my case crashes. # fio --filename=/dev/vda1 --direct=1 --rw=randwrite --bs=4k --size=200G --numjobs=50 --runtime=90 --group_reporting --name=file1; fio --filename=/dev/vda1 --direct=1 --rw=randwrite --bs=4k --size=200G --numjobs=50 --runtime=90 --group_reporting --name=file1; fio --filename=/dev/vda1 --direct=1 --rw=randwrite --bs=4k --size=200G --numjobs=50 --runtime=90 --group_reporting --name=file1; halt Strangely exactly THIS OSD also has the most log entries: 64K ceph-osd.20.log 64K ceph-osd.21.log 1,3M ceph-osd.22.log 64K ceph-osd.23.log But all OSDs are set to debug osd = 20. dmesg shows: ceph-osd[5381]: segfault at 3f592c000 ip 00007fa281d8eb23 sp 00007fa27702d260 error 4 in libtcmalloc.so.0.0.0[7fa281d6a000+3d000] I uploaded the following files: priebe_fio_randwrite_ceph-osd.21.log.bz2 => OSD which was OK and didn't crash priebe_fio_randwrite_ceph-osd.22.log.bz2 => Log from the crashed OSD üu priebe_fio_randwrite_core.ssdstor001.27204.bz2 => Core dump priebe_fio_randwrite_ceph-osd.bz2 => osd binary Stefan
-- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html