I've done some tests using ceph-fuse and it looks far more stable. I've not experienced any issues so far with ceph-fuse mount point over nfs. Will do more stress testing over and update.
Anyone experiencing issues with hang tasks using ceph kernel module mount method?
Thanks
Anyone experiencing issues with hang tasks using ceph kernel module mount method?
Thanks
From: "Andrei Mikhailovsky" <andrei@xxxxxxxxxx>
To: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
Sent: Friday, 28 November, 2014 12:02:57 PM
Subject: Re: Giant + nfs over cephfs hang tasksI've just tried the latest ubuntu-vivid kernel and also seeing hang tasks with dd tests:
[ 3721.026421] INFO: task nfsd:16596 blocked for more than 120 seconds.
[ 3721.065141] Not tainted 3.17.4-031704-generic #201411211317
[ 3721.103721] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3721.180409] nfsd D 0000000000000009 0 16596 2 0x00000000
[ 3721.180412] ffff88006f0cbc18 0000000000000046 ffff88006f0cbbc8 ffffffff8109677f
[ 3721.180414] ffff88006f0cbfd8 00000000000145c0 ffff88067089f700 00000000000145c0
[ 3721.180417] ffff880673dac600 ffff88045adfbc00 ffff8801bdf8be40 ffff88000b841500
[ 3721.180420] Call Trace:
[ 3721.180423] [<ffffffff8109677f>] ? set_groups+0x2f/0x60
[ 3721.180427] [<ffffffff817a20c9>] schedule+0x29/0x70
[ 3721.180440] [<ffffffff817a23ee>] schedule_preempt_disabled+0xe/0x10
[ 3721.180443] [<ffffffff817a429d>] __mutex_lock_slowpath+0xcd/0x1d0
[ 3721.180447] [<ffffffff817a43c3>] mutex_lock+0x23/0x37
[ 3721.180454] [<ffffffffc071cadd>] nfsd_setattr+0x15d/0x2a0 [nfsd]
[ 3721.180460] [<ffffffffc0727d2e>] nfsd4_setattr+0x14e/0x180 [nfsd]
[ 3721.180467] [<ffffffffc0729eac>] nfsd4_proc_compound+0x4cc/0x730 [nfsd]
[ 3721.180478] [<ffffffffc0715e55>] nfsd_dispatch+0xe5/0x230 [nfsd]
[ 3721.180491] [<ffffffffc05b9882>] ? svc_tcp_adjust_wspace+0x12/0x30 [sunrpc]
[ 3721.180500] [<ffffffffc05b8694>] svc_process_common+0x324/0x680 [sunrpc]
[ 3721.180510] [<ffffffffc05b8d43>] svc_process+0x103/0x160 [sunrpc]
[ 3721.180516] [<ffffffffc07159c7>] nfsd+0x117/0x190 [nfsd]
[ 3721.180526] [<ffffffffc07158b0>] ? nfsd_destroy+0x80/0x80 [nfsd]
[ 3721.180528] [<ffffffff81093359>] kthread+0xc9/0xe0
[ 3721.180533] [<ffffffff81093290>] ? flush_kthread_worker+0x90/0x90
[ 3721.180536] [<ffffffff817a64bc>] ret_from_fork+0x7c/0xb0
[ 3721.180540] [<ffffffff81093290>] ? flush_kthread_worker+0x90/0x90
[ 3721.180577] INFO: task kworker/2:3:28061 blocked for more than 120 seconds.
[ 3721.221450] Not tainted 3.17.4-031704-generic #201411211317
[ 3721.261440] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3721.341394] kworker/2:3 D 0000000000000002 0 28061 2 0x00000000
[ 3721.341408] Workqueue: ceph-trunc ceph_vmtruncate_work [ceph]
[ 3721.341409] ffff8805a6507b08 0000000000000046 ffff8805a6507b88 ffffea00040e8c80
[ 3721.341412] ffff8805a6507fd8 00000000000145c0 ffff88067089b480 00000000000145c0
[ 3721.341414] ffff8801fffec600 ffff880102535a00 ffff88000b8415a8 ffff88046fc94ec0
[ 3721.341417] Call Trace:
[ 3721.341421] [<ffffffff817a2970>] ? bit_wait+0x50/0x50
[ 3721.341424] [<ffffffff817a20c9>] schedule+0x29/0x70
[ 3721.341427] [<ffffffff817a219f>] io_schedule+0x8f/0xd0
[ 3721.341430] [<ffffffff817a299b>] bit_wait_io+0x2b/0x50
[ 3721.341433] [<ffffffff817a2656>] __wait_on_bit_lock+0x76/0xb0
[ 3721.341438] [<ffffffff811756b5>] ? find_get_entries+0xe5/0x160
[ 3721.341440] [<ffffffff8117245e>] __lock_page+0xae/0xb0
[ 3721.341446] [<ffffffff810b3fd0>] ? wake_atomic_t_function+0x40/0x40
[ 3721.341451] [<ffffffff81183226>] truncate_inode_pages_range+0x446/0x700
[ 3721.341455] [<ffffffff81183565>] truncate_inode_pages+0x15/0x20
[ 3721.341457] [<ffffffff811835bc>] truncate_pagecache+0x4c/0x70
[ 3721.341464] [<ffffffffc09f815e>] __ceph_do_pending_vmtruncate+0xde/0x230 [ceph]
[ 3721.341470] [<ffffffffc09f8c73>] ceph_vmtruncate_work+0x23/0x50 [ceph]
[ 3721.341476] [<ffffffff8108cece>] process_one_work+0x14e/0x460
[ 3721.341479] [<ffffffff8108d84b>] worker_thread+0x11b/0x3f0
[ 3721.341482] [<ffffffff8108d730>] ? create_worker+0x1e0/0x1e0
[ 3721.341485] [<ffffffff81093359>] kthread+0xc9/0xe0
[ 3721.341487] [<ffffffff81093290>] ? flush_kthread_worker+0x90/0x90
[ 3721.341490] [<ffffffff817a64bc>] ret_from_fork+0x7c/0xb0
[ 3721.341492] [<ffffffff81093290>] ? flush_kthread_worker+0x90/0x90They do not happen with every dd test, but happen pretty often. Especially when I am running a few dd tests concurrently.
An example test that generated hang tasks above after just 2 runs:
# dd if=/dev/zero of=/tmp/cephfs/4G bs=1M count=4K oflag=direct & dd if=/dev/zero of=/tmp/cephfs/4G1 bs=1M count=4K oflag=direct & dd if=/dev/zero of=/tmp/cephfs/4G2 bs=1M count=4K oflag=direct & dd if=/dev/zero of=/tmp/cephfs/4G3 bs=1M count=4K oflag=direct &
CheersFrom: "Andrei Mikhailovsky" <andrei@xxxxxxxxxx>
To: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
Sent: Friday, 28 November, 2014 11:22:07 AM
Subject: Re: Giant + nfs over cephfs hang tasksI am also noticing some delays working with nfs over cephfs. Especially when making an initial connection. For instance, I run the following:
# time for i in {0..10} ; do time touch /tmp/cephfs/test-$i ; done
where /tmp/cephfs is the nfs mount point running over cephfs
I am noticing that the first touch file is created after about 20-30 seconds. All the following files files are created with no delay.
If I run the command once again, all files are created pretty quickly without any delay. However, if I wait 20-30 minutes and run the command again, the delay with the first file is back again.
Has anyone experienced similar issues?
AndreiFrom: "Andrei Mikhailovsky" <andrei@xxxxxxxxxx>
To: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
Sent: Friday, 28 November, 2014 9:08:17 AM
Subject: [ceph-users] Giant + nfs over cephfs hang tasksHello guys,
I've got a bunch of hang tasks of the nfsd service running over the cephfs (kernel) mounted file system. Here is an example of one of them.
[433079.991218] INFO: task nfsd:32625 blocked for more than 120 seconds.
[433080.029685] Not tainted 3.15.10-031510-generic #201408132333
[433080.068036] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[433080.144235] nfsd D 000000000000000a 0 32625 2 0x00000000
[433080.144241] ffff8801a94dba78 0000000000000002 ffff8801a94dba38 ffff8801a94dbfd8
[433080.144244] 0000000000014540 0000000000014540 ffff880673d63260 ffff880491d264c0
[433080.144247] ffff8801a94dba78 ffff88067fd14e40 ffff880491d264c0 ffffffff8115dff0
[433080.144250] Call Trace:
[433080.144260] [<ffffffff8115dff0>] ? __lock_page+0x70/0x70
[433080.144274] [<ffffffff81778449>] schedule+0x29/0x70
[433080.144279] [<ffffffff8177851f>] io_schedule+0x8f/0xd0
[433080.144282] [<ffffffff8115dffe>] sleep_on_page+0xe/0x20
[433080.144286] [<ffffffff81778be2>] __wait_on_bit+0x62/0x90
[433080.144288] [<ffffffff8115eacb>] ? find_get_pages_tag+0xcb/0x170
[433080.144291] [<ffffffff8115e160>] wait_on_page_bit+0x80/0x90
[433080.144296] [<ffffffff810b54a0>] ? wake_atomic_t_function+0x40/0x40
[433080.144299] [<ffffffff8115e334>] filemap_fdatawait_range+0xf4/0x180
[433080.144302] [<ffffffff8116027d>] filemap_write_and_wait_range+0x4d/0x80
[433080.144315] [<ffffffffa06bf1b8>] ceph_fsync+0x58/0x200 [ceph]
[433080.144330] [<ffffffff813308f5>] ? ima_file_check+0x35/0x40
[433080.144337] [<ffffffff812028c8>] vfs_fsync_range+0x18/0x30
[433080.144352] [<ffffffffa03ee491>] nfsd_commit+0xb1/0xd0 [nfsd]
[433080.144363] [<ffffffffa03fb787>] nfsd4_commit+0x57/0x60 [nfsd]
[433080.144370] [<ffffffffa03fcf9e>] nfsd4_proc_compound+0x54e/0x740 [nfsd]
[433080.144377] [<ffffffffa03e8e05>] nfsd_dispatch+0xe5/0x230 [nfsd]
[433080.144401] [<ffffffffa03205a5>] svc_process_common+0x345/0x680 [sunrpc]
[433080.144413] [<ffffffffa0320c33>] svc_process+0x103/0x160 [sunrpc]
[433080.144418] [<ffffffffa03e895f>] nfsd+0xbf/0x130 [nfsd]
[433080.144424] [<ffffffffa03e88a0>] ? nfsd_destroy+0x80/0x80 [nfsd]
[433080.144428] [<ffffffff81091439>] kthread+0xc9/0xe0
[433080.144431] [<ffffffff81091370>] ? flush_kthread_worker+0xb0/0xb0
[433080.144434] [<ffffffff8178567c>] ret_from_fork+0x7c/0xb0
[433080.144437] [<ffffffff81091370>] ? flush_kthread_worker+0xb0/0xb0
I am using Ubuntu 12.04 servers with 3.15.10 kernel and ceph Giant.
Thanks
Andrei
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com