Re: red IO hang (was disk timeouts in libvirt/qemu VMs...)

Jason Dillaman <jdillama@xxxxxxxxxx> · Wed, 21 Jun 2017 15:55:10 -0400

Do your VMs or OSDs show blocked requests? If you disable scrub or
restart the blocked OSD, does the issue go away? If yes, it most
likely is this issue [1].

[1] http://tracker.ceph.com/issues/20041

On Wed, Jun 21, 2017 at 3:33 PM, Hall, Eric <eric.hall@xxxxxxxxxxxxxx> wrote:
> The VMs are using stock Ubuntu14/16 images so yes, there is the default “/sbin/fstrim –all” in /etc/cron.weekly/fstrim.
>
> --
> Eric
>
> On 6/21/17, 1:58 PM, "Jason Dillaman" <jdillama@xxxxxxxxxx> wrote:
>
>     Are some or many of your VMs issuing periodic fstrims to discard
>     unused extents?
>
>     On Wed, Jun 21, 2017 at 2:36 PM, Hall, Eric <eric.hall@xxxxxxxxxxxxxx> wrote:
>     > After following/changing all suggested items (turning off exclusive-lock
>     > (and associated object-map and fast-diff), changing host cache behavior,
>     > etc.) this is still a blocking issue for many uses of our OpenStack/Ceph
>     > installation.
>     >
>     >
>     >
>     > We have upgraded Ceph to 10.2.7, are running 4.4.0-62 or later kernels on
>     > all storage, compute hosts, and VMs, with libvirt 1.3.1 on compute hosts.
>     > Have also learned quite a bit about producing debug logs. ;)
>     >
>     >
>     >
>     > I’ve followed the related threads since March with bated breath, but still
>     > find no resolution.
>     >
>     >
>     >
>     > Previously, I got pulled away before I could produce/report discussed debug
>     > info, but am back on the case now. Please let me know how I can help
>     > diagnose and resolve this problem.
>     >
>     >
>     >
>     > Any assistance appreciated,
>     >
>     > --
>     >
>     > Eric
>     >
>     >
>     >
>     > On 3/28/17, 3:05 AM, "Marius Vaitiekunas" <mariusvaitiekunas@xxxxxxxxx>
>     > wrote:
>     >
>     >
>     >
>     >
>     >
>     >
>     >
>     > On Mon, Mar 27, 2017 at 11:17 PM, Peter Maloney
>     > <peter.maloney@xxxxxxxxxxxxxxxxxxxx> wrote:
>     >
>     > I can't guarantee it's the same as my issue, but from that it sounds the
>     > same.
>     >
>     > Jewel 10.2.4, 10.2.5 tested
>     > hypervisors are proxmox qemu-kvm, using librbd
>     > 3 ceph nodes with mon+osd on each
>     >
>     > -faster journals, more disks, bcache, rbd_cache, fewer VMs on ceph, iops
>     > and bw limits on client side, jumbo frames, etc. all improve/smooth out
>     > performance and mitigate the hangs, but don't prevent it.
>     > -hangs are usually associated with blocked requests (I set the complaint
>     > time to 5s to see them)
>     > -hangs are very easily caused by rbd snapshot + rbd export-diff to do
>     > incremental backup (one snap persistent, plus one more during backup)
>     > -when qemu VM io hangs, I have to kill -9 the qemu process for it to
>     > stop. Some broken VMs don't appear to be hung until I try to live
>     > migrate them (live migrating all VMs helped test solutions)
>     >
>     > Finally I have a workaround... disable exclusive-lock, object-map, and
>     > fast-diff rbd features (and restart clients via live migrate).
>     > (object-map and fast-diff appear to have no effect on dif or export-diff
>     > ... so I don't miss them). I'll file a bug at some point (after I move
>     > all VMs back and see if it is still stable). And one other user on IRC
>     > said this solved the same problem (also using rbd snapshots).
>     >
>     > And strangely, they don't seem to hang if I put back those features,
>     > until a few days later (making testing much less easy...but now I'm very
>     > sure removing them prevents the issue)
>     >
>     > I hope this works for you (and maybe gets some attention from devs too),
>     > so you don't waste months like me.
>     >
>     >
>     > On 03/27/17 19:31, Hall, Eric wrote:
>     >> In an OpenStack (mitaka) cloud, backed by a ceph cluster (10.2.6 jewel),
>     >> using libvirt/qemu (1.3.1/2.5) hypervisors on Ubuntu 14.04.5 compute and
>     >> ceph hosts, we occasionally see hung processes (usually during boot, but
>     >> otherwise as well), with errors reported in the instance logs as shown
>     >> below.  Configuration is vanilla, based on openstack/ceph docs.
>     >>
>     >> Neither the compute hosts nor the ceph hosts appear to be overloaded in
>     >> terms of memory or network bandwidth, none of the 67 osds are over 80% full,
>     >> nor do any of them appear to be overwhelmed in terms of IO.  Compute hosts
>     >> and ceph cluster are connected via a relatively quiet 1Gb network, with an
>     >> IBoE net between the ceph nodes.  Neither network appears overloaded.
>     >>
>     >> I don’t see any related (to my eye) errors in client or server logs, even
>     >> with 20/20 logging from various components (rbd, rados, client,
>     >> objectcacher, etc.)  I’ve increased the qemu file descriptor limit
>     >> (currently 64k... overkill for sure.)
>     >>
>     >> I “feels” like a performance problem, but I can’t find any capacity issues
>     >> or constraining bottlenecks.
>     >>
>     >> Any suggestions or insights into this situation are appreciated.  Thank
>     >> you for your time,
>     >> --
>     >> Eric
>     >>
>     >>
>     >> [Fri Mar 24 20:30:40 2017] INFO: task jbd2/vda1-8:226 blocked for more
>     >> than 120 seconds.
>     >> [Fri Mar 24 20:30:40 2017]       Not tainted 3.13.0-52-generic #85-Ubuntu
>     >> [Fri Mar 24 20:30:40 2017] "echo 0 >
>     >> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>     >> [Fri Mar 24 20:30:40 2017] jbd2/vda1-8     D ffff88043fd13180     0   226
>     >> 2 0x00000000
>     >> [Fri Mar 24 20:30:40 2017]  ffff88003728bbd8 0000000000000046
>     >> ffff880426900000 ffff88003728bfd8
>     >> [Fri Mar 24 20:30:40 2017]  0000000000013180 0000000000013180
>     >> ffff880426900000 ffff88043fd13a18
>     >> [Fri Mar 24 20:30:40 2017]  ffff88043ffb9478 0000000000000002
>     >> ffffffff811ef7c0 ffff88003728bc50
>     >> [Fri Mar 24 20:30:40 2017] Call Trace:
>     >> [Fri Mar 24 20:30:40 2017]  [<ffffffff811ef7c0>] ?
>     >> generic_block_bmap+0x50/0x50
>     >> [Fri Mar 24 20:30:40 2017]  [<ffffffff81726d2d>] io_schedule+0x9d/0x140
>     >> [Fri Mar 24 20:30:40 2017]  [<ffffffff811ef7ce>] sleep_on_buffer+0xe/0x20
>     >> [Fri Mar 24 20:30:40 2017]  [<ffffffff817271b2>] __wait_on_bit+0x62/0x90
>     >> [Fri Mar 24 20:30:40 2017]  [<ffffffff811ef7c0>] ?
>     >> generic_block_bmap+0x50/0x50
>     >> [Fri Mar 24 20:30:40 2017]  [<ffffffff81727257>]
>     >> out_of_line_wait_on_bit+0x77/0x90
>     >> [Fri Mar 24 20:30:40 2017]  [<ffffffff810ab180>] ?
>     >> autoremove_wake_function+0x40/0x40
>     >> [Fri Mar 24 20:30:40 2017]  [<ffffffff811f0afa>]
>     >> __wait_on_buffer+0x2a/0x30
>     >> [Fri Mar 24 20:30:40 2017]  [<ffffffff8128bb4d>]
>     >> jbd2_journal_commit_transaction+0x185d/0x1ab0
>     >> [Fri Mar 24 20:30:40 2017]  [<ffffffff810755df>] ?
>     >> try_to_del_timer_sync+0x4f/0x70
>     >> [Fri Mar 24 20:30:40 2017]  [<ffffffff8128fe7d>] kjournald2+0xbd/0x250
>     >> [Fri Mar 24 20:30:40 2017]  [<ffffffff810ab140>] ?
>     >> prepare_to_wait_event+0x100/0x100
>     >> [Fri Mar 24 20:30:40 2017]  [<ffffffff8128fdc0>] ?
>     >> commit_timeout+0x10/0x10
>     >> [Fri Mar 24 20:30:40 2017]  [<ffffffff8108b5d2>] kthread+0xd2/0xf0
>     >> [Fri Mar 24 20:30:40 2017]  [<ffffffff8108b500>] ?
>     >> kthread_create_on_node+0x1c0/0x1c0
>     >> [Fri Mar 24 20:30:40 2017]  [<ffffffff8173304c>] ret_from_fork+0x7c/0xb0
>     >> [Fri Mar 24 20:30:40 2017]  [<ffffffff8108b500>] ?
>     >> kthread_create_on_node+0x1c0/0x1c0
>     >>
>     >>
>     >>
>     >> _______________________________________________
>     >> ceph-users mailing list
>     >> ceph-users@xxxxxxxxxxxxxx
>     >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>     >
>     >
>     > _______________________________________________
>     > ceph-users mailing list
>     > ceph-users@xxxxxxxxxxxxxx
>     > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>     >
>     >
>     >
>     > Hi,
>     >
>     >
>     >
>     > We are using these settings on hypervisors in openstack:
>     >
>     > vm.dirty_ratio = 40
>     >
>     > vm.dirty_background_ratio = 5
>     >
>     >
>     >
>     > And these on vms:
>     >
>     > vm.dirty_ratio = 10
>     >
>     > vm.dirty_background_ratio = 5
>     >
>     >
>     >
>     > In our case it prevents vms from crashing.
>     >
>     >
>     >
>     > --
>     >
>     > Marius Vaitiekūnas
>     >
>     >
>     > _______________________________________________
>     > ceph-users mailing list
>     > ceph-users@xxxxxxxxxxxxxx
>     > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>     >
>
>
>
>     --
>     Jason
>
>

-- 
Jason
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com