Hello once again, I seem to have hit one more problem today: 3 nodes test cluster, nodes running 3.18.1 kernel, ceph-0.94.1, 3-replicas pool, backed by SSD osds. After mapping volume using rbd and trying to zero it using dd: dd if=/dev/zero of=/dev/rbd0 bs=1M it was running fine for some time with speed ~ 200 MB/s, but the speed was slowly dropping to ~70MB/s and then the process hung and following backtraces started to appear in dmesg: Apr 24 17:09:45 vfnphav1a kernel: [340710.888081] INFO: task kworker/u8:2:15884 blocked for more than 120 seconds. Apr 24 17:09:45 vfnphav1a kernel: [340710.895645] Not tainted 3.18.11lb6.01 #1 Apr 24 17:09:45 vfnphav1a kernel: [340710.900612] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 24 17:09:45 vfnphav1a kernel: [340710.909290] kworker/u8:2 D 0000000000000001 0 15884 2 0x00000000 Apr 24 17:09:45 vfnphav1a kernel: [340710.917043] Workqueue: writeback bdi_writeback_workfn (flush-252:0) Apr 24 17:09:45 vfnphav1a kernel: [340710.923998] ffff880172b73608 0000000000000046 ffff88021424a850 0000000000004000 Apr 24 17:09:45 vfnphav1a kernel: [340710.932595] ffff8801988d3120 0000000000011640 ffff880172b70010 0000000000011640 Apr 24 17:09:45 vfnphav1a kernel: [340710.941193] 0000000000004000 0000000000011640 ffff8800d7689890 ffff8801988d3120 Apr 24 17:09:45 vfnphav1a kernel: [340710.949799] Call Trace: Apr 24 17:09:45 vfnphav1a kernel: [340710.952746] [<ffffffff8149882e>] ? _raw_spin_unlock+0xe/0x30 Apr 24 17:09:45 vfnphav1a kernel: [340710.959009] [<ffffffff8123ba6b>] ? queue_unplugged+0x5b/0xe0 Apr 24 17:09:45 vfnphav1a kernel: [340710.965258] [<ffffffff81494149>] schedule+0x29/0x70 Apr 24 17:09:45 vfnphav1a kernel: [340710.970728] [<ffffffff8149421c>] io_schedule+0x8c/0xd0 Apr 24 17:09:45 vfnphav1a kernel: [340710.976462] [<ffffffff81239e95>] get_request+0x445/0x860 Apr 24 17:09:45 vfnphav1a kernel: [340710.982366] [<ffffffff81086680>] ? bit_waitqueue+0x80/0x80 Apr 24 17:09:45 vfnphav1a kernel: [340710.988443] [<ffffffff812358eb>] ? elv_merge+0xeb/0xf0 Apr 24 17:09:45 vfnphav1a kernel: [340710.994167] [<ffffffff8123bdf8>] blk_queue_bio+0xc8/0x360 Apr 24 17:09:45 vfnphav1a kernel: [340711.000159] [<ffffffff81239790>] generic_make_request+0xc0/0x100 Apr 24 17:09:45 vfnphav1a kernel: [340711.006760] [<ffffffff81239841>] submit_bio+0x71/0x140 Apr 24 17:09:45 vfnphav1a kernel: [340711.012489] [<ffffffff811b5aae>] _submit_bh+0x11e/0x170 Apr 24 17:09:45 vfnphav1a kernel: [340711.018307] [<ffffffff811b5b10>] submit_bh+0x10/0x20 Apr 24 17:09:45 vfnphav1a kernel: [340711.023865] [<ffffffff811b98e8>] __block_write_full_page.clone.0+0x198/0x340 Apr 24 17:09:45 vfnphav1a kernel: [340711.031846] [<ffffffff811b9cb0>] ? I_BDEV+0x10/0x10 Apr 24 17:09:45 vfnphav1a kernel: [340711.037313] [<ffffffff811b9cb0>] ? I_BDEV+0x10/0x10 Apr 24 17:09:45 vfnphav1a kernel: [340711.042784] [<ffffffff811b9c5a>] block_write_full_page+0xba/0x100 Apr 24 17:09:45 vfnphav1a kernel: [340711.049477] [<ffffffff811bab88>] blkdev_writepage+0x18/0x20 Apr 24 17:09:45 vfnphav1a kernel: [340711.055642] [<ffffffff811231ca>] __writepage+0x1a/0x50 Apr 24 17:09:45 vfnphav1a kernel: [340711.061374] [<ffffffff81124427>] write_cache_pages+0x1e7/0x4e0 Apr 24 17:09:45 vfnphav1a kernel: [340711.067797] [<ffffffff811231b0>] ? set_page_dirty+0x60/0x60 Apr 24 17:09:45 vfnphav1a kernel: [340711.073952] [<ffffffff81124774>] generic_writepages+0x54/0x80 Apr 24 17:09:45 vfnphav1a kernel: [340711.080292] [<ffffffff811247c3>] do_writepages+0x23/0x40 Apr 24 17:09:45 vfnphav1a kernel: [340711.086196] [<ffffffff811add39>] __writeback_single_inode+0x49/0x2c0 Apr 24 17:09:45 vfnphav1a kernel: [340711.093131] [<ffffffff81086c8f>] ? wake_up_bit+0x2f/0x40 Apr 24 17:09:45 vfnphav1a kernel: [340711.099028] [<ffffffff811af3b6>] writeback_sb_inodes+0x2d6/0x490 Apr 24 17:09:45 vfnphav1a kernel: [340711.105625] [<ffffffff811af60e>] __writeback_inodes_wb+0x9e/0xd0 Apr 24 17:09:45 vfnphav1a kernel: [340711.112223] [<ffffffff811af83b>] wb_writeback+0x1fb/0x320 Apr 24 17:09:45 vfnphav1a kernel: [340711.118214] [<ffffffff811afa60>] wb_do_writeback+0x100/0x210 Apr 24 17:09:45 vfnphav1a kernel: [340711.124466] [<ffffffff811afbe0>] bdi_writeback_workfn+0x70/0x250 Apr 24 17:09:45 vfnphav1a kernel: [340711.131063] [<ffffffff814954de>] ? mutex_unlock+0xe/0x10 Apr 24 17:09:45 vfnphav1a kernel: [340711.136974] [<ffffffffa02c4ef4>] ? bnx2x_release_phy_lock+0x24/0x30 [bnx2x] Apr 24 17:09:45 vfnphav1a kernel: [340711.144530] [<ffffffff8106529a>] process_one_work+0x13a/0x450 Apr 24 17:09:45 vfnphav1a kernel: [340711.150872] [<ffffffff810656d2>] worker_thread+0x122/0x4f0 Apr 24 17:09:45 vfnphav1a kernel: [340711.156944] [<ffffffff81086589>] ? __wake_up_common+0x59/0x90 Apr 24 17:09:45 vfnphav1a kernel: [340711.163280] [<ffffffff810655b0>] ? process_one_work+0x450/0x450 Apr 24 17:09:45 vfnphav1a kernel: [340711.169790] [<ffffffff8106a98e>] kthread+0xde/0x100 Apr 24 17:09:45 vfnphav1a kernel: [340711.175253] [<ffffffff81050dc4>] ? do_exit+0x6e4/0xaa0 Apr 24 17:09:45 vfnphav1a kernel: [340711.180987] [<ffffffff8106a8b0>] ? __init_kthread_worker+0x40/0x40 Apr 24 17:09:45 vfnphav1a kernel: [340711.187757] [<ffffffff81498d88>] ret_from_fork+0x58/0x90 Apr 24 17:09:45 vfnphav1a kernel: [340711.193652] [<ffffffff8106a8b0>] ? __init_kthread_worker+0x40/0x40 the process started "running" after some time, but it's excruciatingly slow, with speeds about 40KB/s. all ceph processes seem to be mostly idle.. From the backtrace I'm not sure if this can't be network adapter problem, since I see some bnc2x_ locking functions, but network seems to be running fine otherwise and I didn't have any issuess till I tried heavily using RBD.. If I could provide some more information, please let me know. BR nik -- ------------------------------------- Ing. Nikola CIPRICH LinuxBox.cz, s.r.o. 28.rijna 168, 709 00 Ostrava tel.: +420 591 166 214 fax: +420 596 621 273 mobil: +420 777 093 799 www.linuxbox.cz mobil servis: +420 737 238 656 email servis: servis@xxxxxxxxxxx -------------------------------------
Attachment:
pgpgubCVZ45fU.pgp
Description: PGP signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com