Re: red IO hang (was disk timeouts in libvirt/qemu VMs...)

Jason Dillaman <jdillama@xxxxxxxxxx> · Wed, 21 Jun 2017 14:58:41 -0400

Are some or many of your VMs issuing periodic fstrims to discard
unused extents?

On Wed, Jun 21, 2017 at 2:36 PM, Hall, Eric <eric.hall@xxxxxxxxxxxxxx> wrote:
> After following/changing all suggested items (turning off exclusive-lock
> (and associated object-map and fast-diff), changing host cache behavior,
> etc.) this is still a blocking issue for many uses of our OpenStack/Ceph
> installation.
>
>
>
> We have upgraded Ceph to 10.2.7, are running 4.4.0-62 or later kernels on
> all storage, compute hosts, and VMs, with libvirt 1.3.1 on compute hosts.
> Have also learned quite a bit about producing debug logs. ;)
>
>
>
> I’ve followed the related threads since March with bated breath, but still
> find no resolution.
>
>
>
> Previously, I got pulled away before I could produce/report discussed debug
> info, but am back on the case now. Please let me know how I can help
> diagnose and resolve this problem.
>
>
>
> Any assistance appreciated,
>
> --
>
> Eric
>
>
>
> On 3/28/17, 3:05 AM, "Marius Vaitiekunas" <mariusvaitiekunas@xxxxxxxxx>
> wrote:
>
>
>
>
>
>
>
> On Mon, Mar 27, 2017 at 11:17 PM, Peter Maloney
> <peter.maloney@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> I can't guarantee it's the same as my issue, but from that it sounds the
> same.
>
> Jewel 10.2.4, 10.2.5 tested
> hypervisors are proxmox qemu-kvm, using librbd
> 3 ceph nodes with mon+osd on each
>
> -faster journals, more disks, bcache, rbd_cache, fewer VMs on ceph, iops
> and bw limits on client side, jumbo frames, etc. all improve/smooth out
> performance and mitigate the hangs, but don't prevent it.
> -hangs are usually associated with blocked requests (I set the complaint
> time to 5s to see them)
> -hangs are very easily caused by rbd snapshot + rbd export-diff to do
> incremental backup (one snap persistent, plus one more during backup)
> -when qemu VM io hangs, I have to kill -9 the qemu process for it to
> stop. Some broken VMs don't appear to be hung until I try to live
> migrate them (live migrating all VMs helped test solutions)
>
> Finally I have a workaround... disable exclusive-lock, object-map, and
> fast-diff rbd features (and restart clients via live migrate).
> (object-map and fast-diff appear to have no effect on dif or export-diff
> ... so I don't miss them). I'll file a bug at some point (after I move
> all VMs back and see if it is still stable). And one other user on IRC
> said this solved the same problem (also using rbd snapshots).
>
> And strangely, they don't seem to hang if I put back those features,
> until a few days later (making testing much less easy...but now I'm very
> sure removing them prevents the issue)
>
> I hope this works for you (and maybe gets some attention from devs too),
> so you don't waste months like me.
>
>
> On 03/27/17 19:31, Hall, Eric wrote:
>> In an OpenStack (mitaka) cloud, backed by a ceph cluster (10.2.6 jewel),
>> using libvirt/qemu (1.3.1/2.5) hypervisors on Ubuntu 14.04.5 compute and
>> ceph hosts, we occasionally see hung processes (usually during boot, but
>> otherwise as well), with errors reported in the instance logs as shown
>> below.  Configuration is vanilla, based on openstack/ceph docs.
>>
>> Neither the compute hosts nor the ceph hosts appear to be overloaded in
>> terms of memory or network bandwidth, none of the 67 osds are over 80% full,
>> nor do any of them appear to be overwhelmed in terms of IO.  Compute hosts
>> and ceph cluster are connected via a relatively quiet 1Gb network, with an
>> IBoE net between the ceph nodes.  Neither network appears overloaded.
>>
>> I don’t see any related (to my eye) errors in client or server logs, even
>> with 20/20 logging from various components (rbd, rados, client,
>> objectcacher, etc.)  I’ve increased the qemu file descriptor limit
>> (currently 64k... overkill for sure.)
>>
>> I “feels” like a performance problem, but I can’t find any capacity issues
>> or constraining bottlenecks.
>>
>> Any suggestions or insights into this situation are appreciated.  Thank
>> you for your time,
>> --
>> Eric
>>
>>
>> [Fri Mar 24 20:30:40 2017] INFO: task jbd2/vda1-8:226 blocked for more
>> than 120 seconds.
>> [Fri Mar 24 20:30:40 2017]       Not tainted 3.13.0-52-generic #85-Ubuntu
>> [Fri Mar 24 20:30:40 2017] "echo 0 >
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [Fri Mar 24 20:30:40 2017] jbd2/vda1-8     D ffff88043fd13180     0   226
>> 2 0x00000000
>> [Fri Mar 24 20:30:40 2017]  ffff88003728bbd8 0000000000000046
>> ffff880426900000 ffff88003728bfd8
>> [Fri Mar 24 20:30:40 2017]  0000000000013180 0000000000013180
>> ffff880426900000 ffff88043fd13a18
>> [Fri Mar 24 20:30:40 2017]  ffff88043ffb9478 0000000000000002
>> ffffffff811ef7c0 ffff88003728bc50
>> [Fri Mar 24 20:30:40 2017] Call Trace:
>> [Fri Mar 24 20:30:40 2017]  [<ffffffff811ef7c0>] ?
>> generic_block_bmap+0x50/0x50
>> [Fri Mar 24 20:30:40 2017]  [<ffffffff81726d2d>] io_schedule+0x9d/0x140
>> [Fri Mar 24 20:30:40 2017]  [<ffffffff811ef7ce>] sleep_on_buffer+0xe/0x20
>> [Fri Mar 24 20:30:40 2017]  [<ffffffff817271b2>] __wait_on_bit+0x62/0x90
>> [Fri Mar 24 20:30:40 2017]  [<ffffffff811ef7c0>] ?
>> generic_block_bmap+0x50/0x50
>> [Fri Mar 24 20:30:40 2017]  [<ffffffff81727257>]
>> out_of_line_wait_on_bit+0x77/0x90
>> [Fri Mar 24 20:30:40 2017]  [<ffffffff810ab180>] ?
>> autoremove_wake_function+0x40/0x40
>> [Fri Mar 24 20:30:40 2017]  [<ffffffff811f0afa>]
>> __wait_on_buffer+0x2a/0x30
>> [Fri Mar 24 20:30:40 2017]  [<ffffffff8128bb4d>]
>> jbd2_journal_commit_transaction+0x185d/0x1ab0
>> [Fri Mar 24 20:30:40 2017]  [<ffffffff810755df>] ?
>> try_to_del_timer_sync+0x4f/0x70
>> [Fri Mar 24 20:30:40 2017]  [<ffffffff8128fe7d>] kjournald2+0xbd/0x250
>> [Fri Mar 24 20:30:40 2017]  [<ffffffff810ab140>] ?
>> prepare_to_wait_event+0x100/0x100
>> [Fri Mar 24 20:30:40 2017]  [<ffffffff8128fdc0>] ?
>> commit_timeout+0x10/0x10
>> [Fri Mar 24 20:30:40 2017]  [<ffffffff8108b5d2>] kthread+0xd2/0xf0
>> [Fri Mar 24 20:30:40 2017]  [<ffffffff8108b500>] ?
>> kthread_create_on_node+0x1c0/0x1c0
>> [Fri Mar 24 20:30:40 2017]  [<ffffffff8173304c>] ret_from_fork+0x7c/0xb0
>> [Fri Mar 24 20:30:40 2017]  [<ffffffff8108b500>] ?
>> kthread_create_on_node+0x1c0/0x1c0
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> Hi,
>
>
>
> We are using these settings on hypervisors in openstack:
>
> vm.dirty_ratio = 40
>
> vm.dirty_background_ratio = 5
>
>
>
> And these on vms:
>
> vm.dirty_ratio = 10
>
> vm.dirty_background_ratio = 5
>
>
>
> In our case it prevents vms from crashing.
>
>
>
> --
>
> Marius Vaitiekūnas
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

-- 
Jason
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com