3.18.11 - RBD triggered deadlock?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello once again,

I seem to have hit one more problem today:
3 nodes test cluster, nodes running 3.18.1 kernel,
ceph-0.94.1, 3-replicas pool, backed by SSD osds.

After mapping volume using rbd and trying to zero it
using dd:

dd if=/dev/zero of=/dev/rbd0 bs=1M

it was running fine for some time with speed ~ 200 MB/s,
but the speed was slowly dropping to ~70MB/s and then the process
hung and following backtraces started to appear in dmesg:

Apr 24 17:09:45 vfnphav1a kernel: [340710.888081] INFO: task kworker/u8:2:15884 blocked for more than 120 seconds.
Apr 24 17:09:45 vfnphav1a kernel: [340710.895645]       Not tainted 3.18.11lb6.01 #1
Apr 24 17:09:45 vfnphav1a kernel: [340710.900612] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 24 17:09:45 vfnphav1a kernel: [340710.909290] kworker/u8:2    D 0000000000000001     0 15884      2 0x00000000
Apr 24 17:09:45 vfnphav1a kernel: [340710.917043] Workqueue: writeback bdi_writeback_workfn (flush-252:0)
Apr 24 17:09:45 vfnphav1a kernel: [340710.923998]  ffff880172b73608 0000000000000046 ffff88021424a850 0000000000004000
Apr 24 17:09:45 vfnphav1a kernel: [340710.932595]  ffff8801988d3120 0000000000011640 ffff880172b70010 0000000000011640
Apr 24 17:09:45 vfnphav1a kernel: [340710.941193]  0000000000004000 0000000000011640 ffff8800d7689890 ffff8801988d3120
Apr 24 17:09:45 vfnphav1a kernel: [340710.949799] Call Trace:
Apr 24 17:09:45 vfnphav1a kernel: [340710.952746]  [<ffffffff8149882e>] ? _raw_spin_unlock+0xe/0x30
Apr 24 17:09:45 vfnphav1a kernel: [340710.959009]  [<ffffffff8123ba6b>] ? queue_unplugged+0x5b/0xe0
Apr 24 17:09:45 vfnphav1a kernel: [340710.965258]  [<ffffffff81494149>] schedule+0x29/0x70
Apr 24 17:09:45 vfnphav1a kernel: [340710.970728]  [<ffffffff8149421c>] io_schedule+0x8c/0xd0
Apr 24 17:09:45 vfnphav1a kernel: [340710.976462]  [<ffffffff81239e95>] get_request+0x445/0x860
Apr 24 17:09:45 vfnphav1a kernel: [340710.982366]  [<ffffffff81086680>] ? bit_waitqueue+0x80/0x80
Apr 24 17:09:45 vfnphav1a kernel: [340710.988443]  [<ffffffff812358eb>] ? elv_merge+0xeb/0xf0
Apr 24 17:09:45 vfnphav1a kernel: [340710.994167]  [<ffffffff8123bdf8>] blk_queue_bio+0xc8/0x360
Apr 24 17:09:45 vfnphav1a kernel: [340711.000159]  [<ffffffff81239790>] generic_make_request+0xc0/0x100
Apr 24 17:09:45 vfnphav1a kernel: [340711.006760]  [<ffffffff81239841>] submit_bio+0x71/0x140
Apr 24 17:09:45 vfnphav1a kernel: [340711.012489]  [<ffffffff811b5aae>] _submit_bh+0x11e/0x170
Apr 24 17:09:45 vfnphav1a kernel: [340711.018307]  [<ffffffff811b5b10>] submit_bh+0x10/0x20
Apr 24 17:09:45 vfnphav1a kernel: [340711.023865]  [<ffffffff811b98e8>] __block_write_full_page.clone.0+0x198/0x340
Apr 24 17:09:45 vfnphav1a kernel: [340711.031846]  [<ffffffff811b9cb0>] ? I_BDEV+0x10/0x10
Apr 24 17:09:45 vfnphav1a kernel: [340711.037313]  [<ffffffff811b9cb0>] ? I_BDEV+0x10/0x10
Apr 24 17:09:45 vfnphav1a kernel: [340711.042784]  [<ffffffff811b9c5a>] block_write_full_page+0xba/0x100
Apr 24 17:09:45 vfnphav1a kernel: [340711.049477]  [<ffffffff811bab88>] blkdev_writepage+0x18/0x20
Apr 24 17:09:45 vfnphav1a kernel: [340711.055642]  [<ffffffff811231ca>] __writepage+0x1a/0x50
Apr 24 17:09:45 vfnphav1a kernel: [340711.061374]  [<ffffffff81124427>] write_cache_pages+0x1e7/0x4e0
Apr 24 17:09:45 vfnphav1a kernel: [340711.067797]  [<ffffffff811231b0>] ? set_page_dirty+0x60/0x60
Apr 24 17:09:45 vfnphav1a kernel: [340711.073952]  [<ffffffff81124774>] generic_writepages+0x54/0x80
Apr 24 17:09:45 vfnphav1a kernel: [340711.080292]  [<ffffffff811247c3>] do_writepages+0x23/0x40
Apr 24 17:09:45 vfnphav1a kernel: [340711.086196]  [<ffffffff811add39>] __writeback_single_inode+0x49/0x2c0
Apr 24 17:09:45 vfnphav1a kernel: [340711.093131]  [<ffffffff81086c8f>] ? wake_up_bit+0x2f/0x40
Apr 24 17:09:45 vfnphav1a kernel: [340711.099028]  [<ffffffff811af3b6>] writeback_sb_inodes+0x2d6/0x490
Apr 24 17:09:45 vfnphav1a kernel: [340711.105625]  [<ffffffff811af60e>] __writeback_inodes_wb+0x9e/0xd0
Apr 24 17:09:45 vfnphav1a kernel: [340711.112223]  [<ffffffff811af83b>] wb_writeback+0x1fb/0x320
Apr 24 17:09:45 vfnphav1a kernel: [340711.118214]  [<ffffffff811afa60>] wb_do_writeback+0x100/0x210
Apr 24 17:09:45 vfnphav1a kernel: [340711.124466]  [<ffffffff811afbe0>] bdi_writeback_workfn+0x70/0x250
Apr 24 17:09:45 vfnphav1a kernel: [340711.131063]  [<ffffffff814954de>] ? mutex_unlock+0xe/0x10
Apr 24 17:09:45 vfnphav1a kernel: [340711.136974]  [<ffffffffa02c4ef4>] ? bnx2x_release_phy_lock+0x24/0x30 [bnx2x]
Apr 24 17:09:45 vfnphav1a kernel: [340711.144530]  [<ffffffff8106529a>] process_one_work+0x13a/0x450
Apr 24 17:09:45 vfnphav1a kernel: [340711.150872]  [<ffffffff810656d2>] worker_thread+0x122/0x4f0
Apr 24 17:09:45 vfnphav1a kernel: [340711.156944]  [<ffffffff81086589>] ? __wake_up_common+0x59/0x90
Apr 24 17:09:45 vfnphav1a kernel: [340711.163280]  [<ffffffff810655b0>] ? process_one_work+0x450/0x450
Apr 24 17:09:45 vfnphav1a kernel: [340711.169790]  [<ffffffff8106a98e>] kthread+0xde/0x100
Apr 24 17:09:45 vfnphav1a kernel: [340711.175253]  [<ffffffff81050dc4>] ? do_exit+0x6e4/0xaa0
Apr 24 17:09:45 vfnphav1a kernel: [340711.180987]  [<ffffffff8106a8b0>] ? __init_kthread_worker+0x40/0x40
Apr 24 17:09:45 vfnphav1a kernel: [340711.187757]  [<ffffffff81498d88>] ret_from_fork+0x58/0x90
Apr 24 17:09:45 vfnphav1a kernel: [340711.193652]  [<ffffffff8106a8b0>] ? __init_kthread_worker+0x40/0x40

the process started "running" after some time, but it's excruciatingly slow, with speeds about 40KB/s.
all ceph processes seem to be mostly idle..

From the backtrace I'm not sure if this can't be network adapter problem, since I see
some bnc2x_ locking functions, but network seems to be running fine otherwise
and I didn't have any issuess till I tried heavily using RBD..

If I could provide some more information, please let me know.

BR

nik


-- 
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:    +420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis@xxxxxxxxxxx
-------------------------------------

Attachment: pgpgubCVZ45fU.pgp
Description: PGP signature

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux