Hi,
On 11/09/2015 04:03 PM, Gregory Farnum wrote:
On Mon, Nov 9, 2015 at 6:57 AM, Burkhard Linke
<Burkhard.Linke@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
Hi,
On 11/09/2015 02:07 PM, Burkhard Linke wrote:
Hi,
*snipsnap*
Cluster is running Hammer 0.94.5 on top of Ubuntu 14.04. Clients use
ceph-fuse with patches for improved page cache handling, but the problem
also occur with the official hammer packages from download.ceph.com
I've tested the same setup with clients running kernel 4.2.5 and using the
kernel cephfs client. I was not able to reproduce the problem in that setup.
What's the workload you're running, precisely? I would not generally
expect multiple accesses to a sqlite database to work *well*, but
offhand I'm not entirely certain why it would work differently between
the kernel and userspace clients. (Probably something to do with the
timing of the shared requests and any writes happening.)
Using SQLite on network filesystems is somewhat challenging, especially
if multiple instances write to the database. The reproducible test case
does not write to the database at all; it simply extracts the table
structure from the default database. The applications itself only read
from the database and do not modify anything. The underlying SQLite
library may attempt to use locking to protect certain operations.
According to dmesg the processes are blocked within fuse calls:
Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.543966] INFO: task
ceph-fuse:6298 blocked for more than 120 seconds.
Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544014] Not
tainted 4.2.5-040205-generic #201510270124
Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544054] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544119] ceph-fuse
D ffff881fbf8d64c0 0 6298 3262 0x00000100
Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544125] ffff881f9768f838
0000000000000086 ffff883fb2d83700 ffff881f97b38dc0
Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544130] 0000000000001000
ffff881f97690000 ffff881fbf8d64c0 7fffffffffffffff
Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544134] 0000000000000002
ffffffff817dc300 ffff881f9768f858 ffffffff817dbb07
Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544138] Call Trace:
Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544147]
[<ffffffff817dc300>] ? bit_wait+0x50/0x50
Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544156]
[<ffffffff817deba9>] schedule_timeout+0x189/0x250
Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544166]
[<ffffffff817dc300>] ? bit_wait+0x50/0x50
Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544176]
[<ffffffff810bcb64>] ? prepare_to_wait_exclusive+0x54/0x80
Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544185]
[<ffffffff817dc0bb>] __wait_on_bit_lock+0x4b/0xa0
Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544195]
[<ffffffff810bd0e0>] ? autoremove_wake_function+0x40/0x40
Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544205]
[<ffffffff8106d962>] ? get_user_pages_fast+0x112/0x190
Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544213]
[<ffffffff812173df>] ? ilookup5_nowait+0x6f/0x90
Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544222]
[<ffffffff812f922d>] fuse_notify+0x14d/0x830
Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544230]
[<ffffffff812f85d4>] ? fuse_copy_do+0x84/0xf0
Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544239]
[<ffffffff810a4f7d>] ? ttwu_do_activate.constprop.89+0x5d/0x70
Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544248]
[<ffffffff811fc0dc>] do_iter_readv_writev+0x6c/0xa0
Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544257]
[<ffffffff811bc9d8>] ? mprotect_fixup+0x148/0x230
Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544264]
[<ffffffff811fdae9>] SyS_writev+0x59/0xf0
Nov 9 14:19:08 hp-s2-r2-compute kernel: [ 1201.672548] Not
tainted 4.2.5-040205-generic #201510270124
Nov 9 14:19:08 hp-s2-r2-compute kernel: [ 1201.672654] ceph-fuse
D ffff881fbf8d64c0 0 6298 3262 0x00000100
Nov 9 14:19:08 hp-s2-r2-compute kernel: [ 1201.672665] 0000000000001000
ffff881f97690000 ffff881fbf8d64c0 7fffffffffffffff
Nov 9 14:19:08 hp-s2-r2-compute kernel: [ 1201.672673] Call Trace:
Nov 9 14:19:08 hp-s2-r2-compute kernel: [ 1201.672687]
[<ffffffff817dbb07>] schedule+0x37/0x80
Nov 9 14:19:08 hp-s2-r2-compute kernel: [ 1201.672698]
[<ffffffff8101dcd9>] ? read_tsc+0x9/0x10
Nov 9 14:19:08 hp-s2-r2-compute kernel: [ 1201.672707]
[<ffffffff817db114>] io_schedule_timeout+0xa4/0x110
Nov 9 14:19:08 hp-s2-r2-compute kernel: [ 1201.672717]
[<ffffffff817dc335>] bit_wait_io+0x35/0x50
Nov 9 14:19:08 hp-s2-r2-compute kernel: [ 1201.672726]
[<ffffffff8118186b>] __lock_page+0xbb/0xe0
Nov 9 14:19:08 hp-s2-r2-compute kernel: [ 1201.672736]
[<ffffffff811934cc>] invalidate_inode_pages2_range+0x22c/0x460
Nov 9 14:19:08 hp-s2-r2-compute kernel: [ 1201.672745]
[<ffffffff81304a80>] ? fuse_init_file_inode+0x30/0x30
Nov 9 14:19:08 hp-s2-r2-compute kernel: [ 1201.672753]
[<ffffffff813068a6>] fuse_reverse_inval_inode+0x66/0x90
Nov 9 14:19:08 hp-s2-r2-compute kernel: [ 1201.672761]
[<ffffffff813c8e12>] ? iov_iter_get_pages+0xa2/0x220
Nov 9 14:19:08 hp-s2-r2-compute kernel: [ 1201.672770]
[<ffffffff812f9f0d>] fuse_dev_do_write+0x22d/0x380
Nov 9 14:19:08 hp-s2-r2-compute kernel: [ 1201.672779]
[<ffffffff812fa41b>] fuse_dev_write+0x5b/0x80
Nov 9 14:19:08 hp-s2-r2-compute kernel: [ 1201.672786]
[<ffffffff811fcc66>] do_readv_writev+0x196/0x250
Nov 9 14:19:08 hp-s2-r2-compute kernel: [ 1201.672796]
[<ffffffff811fcda9>] vfs_writev+0x39/0x50
Nov 9 14:19:08 hp-s2-r2-compute kernel: [ 1201.672803]
[<ffffffff817dfb72>] entry_SYSCALL_64_fastpath+0x16/0x75
The fact that the kernel client is working so far may be timing related.
I've also done test runs on the cluster with 20 instance of the
application and a small dataset running in parallel without any problem
so far.
Best regards,
Burkhard
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com