Re: libvirt rbd issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



1) Take a look at the number of file descriptors the QEMU process is using, I think you are over the limits

pid=pid of qemu process

cat /proc/$pid/limits
echo /proc/$pid/fd/* | wc -w

2) Jumbo frames may be the cause, are they enabled on the rest of the network? In any case, get rid of NetworkManager ASAP and set it manually, though it looks like your NIC might not support them.

Jan



> On 02 Sep 2015, at 01:44, Rafael Lopez <rafael.lopez@xxxxxxxxxx> wrote:
> 
> Hi ceph-users,
> 
> Hoping to get some help with a tricky problem. I have a rhel7.1 VM guest (host machine also rhel7.1) with root disk presented from ceph 0.94.2-0 (rbd) using libvirt. 
> 
> The VM also has a second rbd for storage presented from the same ceph cluster, also using libvirt.
> 
> The VM boots fine, no apparent issues with the OS root rbd. I am able to mount the storage disk in the VM, and create a file system. I can even transfer small files to it. But when I try to transfer a moderate size files, eg. greater than 1GB, it seems to slow to a grinding halt and eventually it locks up the whole system, and generates the kernel messages below. 
> 
> I have googled some *similar* issues around, but haven't come across some solid advice/fix. So far I have tried modifying the libvirt disk cache settings, tried using the latest mainline kernel (4.2+), different file systems (ext4, xfs, zfs) all produce similar results. I suspect it may be network related, as when I was using the mainline kernel I was transferring some files to the storage disk and this message came up, and the transfer seemed to stop at the same time:
> 
> Sep  1 15:31:22 nas1-rds NetworkManager[724]: <error> [1441085482.078646] [platform/nm-linux-platform.c:2133] sysctl_set(): sysctl: failed to set '/proc/sys/net/ipv6/conf/eth0/mtu' to '9000': (22) Invalid argument
> 
> I think maybe the key info to troubleshooting is that it seems to be OK for files under 1GB. 
> 
> Any ideas would be appreciated.
> 
> Cheers,
> Raf
> 
> 
> Sep  1 16:04:15 nas1-rds kernel: INFO: task kworker/u8:1:60 blocked for more than 120 seconds.
> Sep  1 16:04:15 nas1-rds kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep  1 16:04:15 nas1-rds kernel: kworker/u8:1    D ffff88023fd93680     0    60      2 0x00000000
> Sep  1 16:04:15 nas1-rds kernel: Workqueue: writeback bdi_writeback_workfn (flush-252:80)
> Sep  1 16:04:15 nas1-rds kernel: ffff880230c136b0 0000000000000046 ffff8802313c4440 ffff880230c13fd8
> Sep  1 16:04:15 nas1-rds kernel: ffff880230c13fd8 ffff880230c13fd8 ffff8802313c4440 ffff88023fd93f48
> Sep  1 16:04:15 nas1-rds kernel: ffff880230c137b0 ffff880230fbcb08 ffffe8ffffd80ec0 ffff88022e827590
> Sep  1 16:04:15 nas1-rds kernel: Call Trace:
> Sep  1 16:04:15 nas1-rds kernel: [<ffffffff8160955d>] io_schedule+0x9d/0x130
> Sep  1 16:04:15 nas1-rds kernel: [<ffffffff812b8d5f>] bt_get+0x10f/0x1a0
> Sep  1 16:04:15 nas1-rds kernel: [<ffffffff81098230>] ? wake_up_bit+0x30/0x30
> Sep  1 16:04:15 nas1-rds kernel: [<ffffffff812b90ef>] blk_mq_get_tag+0xbf/0xf0
> Sep  1 16:04:15 nas1-rds kernel: [<ffffffff812b4f3b>] __blk_mq_alloc_request+0x1b/0x1f0
> Sep  1 16:04:15 nas1-rds kernel: [<ffffffff812b68a1>] blk_mq_map_request+0x181/0x1e0
> Sep  1 16:04:15 nas1-rds kernel: [<ffffffff812b7a1a>] blk_sq_make_request+0x9a/0x380
> Sep  1 16:04:15 nas1-rds kernel: [<ffffffff812aa28f>] ? generic_make_request_checks+0x24f/0x380
> Sep  1 16:04:15 nas1-rds kernel: [<ffffffff812aa4a2>] generic_make_request+0xe2/0x130
> Sep  1 16:04:15 nas1-rds kernel: [<ffffffff812aa561>] submit_bio+0x71/0x150
> Sep  1 16:04:15 nas1-rds kernel: [<ffffffffa01ddc55>] ext4_io_submit+0x25/0x50 [ext4]
> Sep  1 16:04:15 nas1-rds kernel: [<ffffffffa01dde09>] ext4_bio_write_page+0x159/0x2e0 [ext4]
> Sep  1 16:04:15 nas1-rds kernel: [<ffffffffa01d4f6d>] mpage_submit_page+0x5d/0x80 [ext4]
> Sep  1 16:04:15 nas1-rds kernel: [<ffffffffa01d5232>] mpage_map_and_submit_buffers+0x172/0x2a0 [ext4]
> Sep  1 16:04:15 nas1-rds kernel: [<ffffffffa01da313>] ext4_writepages+0x733/0xd60 [ext4]
> Sep  1 16:04:15 nas1-rds kernel: [<ffffffff81162b6e>] do_writepages+0x1e/0x40
> Sep  1 16:04:15 nas1-rds kernel: [<ffffffff811efe10>] __writeback_single_inode+0x40/0x220
> Sep  1 16:04:15 nas1-rds kernel: [<ffffffff811f0b0e>] writeback_sb_inodes+0x25e/0x420
> Sep  1 16:04:15 nas1-rds kernel: [<ffffffff811f0d6f>] __writeback_inodes_wb+0x9f/0xd0
> Sep  1 16:04:15 nas1-rds kernel: [<ffffffff811f15b3>] wb_writeback+0x263/0x2f0
> Sep  1 16:04:15 nas1-rds kernel: [<ffffffff811f2aec>] bdi_writeback_workfn+0x1cc/0x460
> Sep  1 16:04:15 nas1-rds kernel: [<ffffffff8108f0ab>] process_one_work+0x17b/0x470
> Sep  1 16:04:15 nas1-rds kernel: [<ffffffff8108fe8b>] worker_thread+0x11b/0x400
> Sep  1 16:04:15 nas1-rds kernel: [<ffffffff8108fd70>] ? rescuer_thread+0x400/0x400
> Sep  1 16:04:15 nas1-rds kernel: [<ffffffff8109726f>] kthread+0xcf/0xe0
> Sep  1 16:04:15 nas1-rds kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
> Sep  1 16:04:15 nas1-rds kernel: [<ffffffff81613cfc>] ret_from_fork+0x7c/0xb0
> Sep  1 16:04:15 nas1-rds kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux