Re: VM with rbd volume hangs on write during load

Jan Schermer <jan@xxxxxxxxxxx> · Wed, 15 Jul 2015 09:17:56 +0200

Do you have a comparison of the same workload on a different storage than CEPH?

I am asking because those messages indicate slow request - basically some operation took 120s (which would look insanely high to a storage administrator, but is sort of expected in a cloud environment). And even on a regular direct attached storage, some OPs can take that look but those issues are masked in drivers (they don’t necessarily manifest as “task blocked” or “soft lockups” but just as 100% iowait for periods of time - which is considered normal under load).

In other words - are you seeing a real problem with your workload or just the messages?

If you don’t have any slow ops then you either have the warning set to high, or those blocked operations consist of more than just one OP - it adds up.

It could also be caused by a network problem (like misconfigured offloads on network cards causing retransmissions/reorders and such) or if for example you run out of file desriptors on either the client or server side, it manifests as some requests getting stuck _without_ getting any slow ops on the ceph cluster side. 

And an obligatory question - you say your OSDs don’t use much CPU, but how are the disks? Aren’t some of them 100% utilized when this happens?

Jan

> On 15 Jul 2015, at 02:23, Jeya Ganesh Babu Jegatheesan <jjeya@xxxxxxxxxxx> wrote:
> 
> 
> 
> On 7/14/15, 4:56 PM, "ceph-users on behalf of Wido den Hollander"
> <ceph-users-bounces@xxxxxxxxxxxxxx on behalf of wido@xxxxxxxx> wrote:
> 
>> On 07/15/2015 01:17 AM, Jeya Ganesh Babu Jegatheesan wrote:
>>> Hi,
>>> 
>>> We have a Openstack + Ceph cluster based on Giant release. We use ceph
>>> for the VMs volumes including the boot volumes. Under load, we see the
>>> write access to the volumes stuck from within the VM. The same would
>>> work after a VM reboot. The issue is seen with and without rbd cache.
>>> Let me know if this is some known issue and any way to debug further.
>>> The ceph cluster itself seems to be clean. We have currently disabled
>>> scrub and deep scrub. 'ceph -s' output as below.
>>> 
>> 
>> Are you seeing slow requests in the system?
> 
> I dont see slow requests in the cluster.
> 
>> 
>> Are any of the disks under the OSDs 100% busy or close to it?
> 
> Most of the OSDs use 20% of a core. There is no OSD process busy at 100%.
> 
>> 
>> Btw, the amount of PGs is rather high. You are at 88, while the formula
>> recommends:
>> 
>> num_osd * 100 / 3 = 14k (cluster total)
> 
> We used 30 * num_osd per pool. We do have 4 pools, i believe thats the why
> the PG seems to be be high.
> 
>> 
>> Wido
>> 
>>>    cluster eaaeaa55-a8e7-4531-a5eb-03d73028b59d
>>>     health HEALTH_WARN noscrub,nodeep-scrub flag(s) set
>>>     monmap e71: 9 mons at
>>> {gngsvc009a=10.163.43.1:6789/0,gngsvc009b=10.163.43.2:6789/0,gngsvc010a=1
>>> 0.163.43.5:6789/0,gngsvc010b=10.163.43.6:6789/0,gngsvc011a=10.163.43.9:67
>>> 89/0,gngsvc011b=10.163.43.10:6789/0,gngsvc011c=10.163.43.11:6789/0,gngsvm
>>> 010d=10.163.43.8:6789/0,gngsvm011d=10.163.43.12:6789/0}, election epoch
>>> 22246, quorum 0,1,2,3,4,5,6,7,8
>>> gngsvc009a,gngsvc009b,gngsvc010a,gngsvc010b,gngsvm010d,gngsvc011a,gngsvc0
>>> 11b,gngsvc011c,gngsvm011d
>>>     osdmap e54600: 425 osds: 425 up, 425 in
>>>            flags noscrub,nodeep-scrub
>>>      pgmap v13257438: 37620 pgs, 4 pools, 134 TB data, 35289 kobjects
>>>            402 TB used, 941 TB / 1344 TB avail
>>>               37620 active+clean
>>>  client io 94059 kB/s rd, 313 MB/s wr, 4623 op/s
>>> 
>>> 
>>> The traces we see in the VM's kernel are as below.
>>> 
>>> [ 1080.552901] INFO: task jbd2/vdb-8:813 blocked for more than 120
>>> seconds.
>>> [ 1080.553027]       Tainted: GF          O 3.13.0-34-generic
>>> #60~precise1-Ubuntu
>>> [ 1080.553157] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>> disables this message.
>>> [ 1080.553295] jbd2/vdb-8      D ffff88003687e3e0     0   813      2
>>> 0x00000000
>>> [ 1080.553298]  ffff880444fadb48 0000000000000002 ffff880455114440
>>> ffff880444fadfd8
>>> [ 1080.553302]  0000000000014440 0000000000014440 ffff88044a9317f0
>>> ffff88044b7917f0
>>> [ 1080.553303]  ffff880444fadb48 ffff880455114cd8 ffff88044b7917f0
>>> ffffffff811fc670
>>> [ 1080.553307] Call Trace:
>>> [ 1080.553309]  [<ffffffff811fc670>] ? __wait_on_buffer+0x30/0x30
>>> [ 1080.553311]  [<ffffffff8175b8b9>] schedule+0x29/0x70
>>> [ 1080.553313]  [<ffffffff8175b98f>] io_schedule+0x8f/0xd0
>>> [ 1080.553315]  [<ffffffff811fc67e>] sleep_on_buffer+0xe/0x20
>>> [ 1080.553316]  [<ffffffff8175c052>] __wait_on_bit+0x62/0x90
>>> [ 1080.553318]  [<ffffffff811fc670>] ? __wait_on_buffer+0x30/0x30
>>> [ 1080.553320]  [<ffffffff8175c0fc>] out_of_line_wait_on_bit+0x7c/0x90
>>> [ 1080.553322]  [<ffffffff810aff70>] ? wake_atomic_t_function+0x40/0x40
>>> [ 1080.553324]  [<ffffffff811fc66e>] __wait_on_buffer+0x2e/0x30
>>> [ 1080.553326]  [<ffffffff8129806b>]
>>> jbd2_journal_commit_transaction+0x136b/0x1520
>>> [ 1080.553329]  [<ffffffff810a1f75>] ? sched_clock_local+0x25/0x90
>>> [ 1080.553331]  [<ffffffff8109a7b8>] ? finish_task_switch+0x128/0x170
>>> [ 1080.553333]  [<ffffffff8107891f>] ? try_to_del_timer_sync+0x4f/0x70
>>> [ 1080.553334]  [<ffffffff8129c5d8>] kjournald2+0xb8/0x240
>>> [ 1080.553336]  [<ffffffff810afef0>] ? __wake_up_sync+0x20/0x20
>>> [ 1080.553338]  [<ffffffff8129c520>] ? commit_timeout+0x10/0x10
>>> [ 1080.553340]  [<ffffffff8108fa79>] kthread+0xc9/0xe0
>>> [ 1080.553343]  [<ffffffff8108f9b0>] ? flush_kthread_worker+0xb0/0xb0
>>> [ 1080.553346]  [<ffffffff8176827c>] ret_from_fork+0x7c/0xb0
>>> [ 1080.553349]  [<ffffffff8108f9b0>] ? flush_kthread_worker+0xb0/0xb0
>>> 
>>> Thanks,
>>> Jeyaganesh.
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> 
>> 
>> 
>> -- 
>> Wido den Hollander
>> 42on B.V.
>> Ceph trainer and consultant
>> 
>> Phone: +31 (0)20 700 9902
>> Skype: contact42on
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com