Re: cephfs: Client hp-s3-r4-compute failing torespondtocapabilityrelease

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 11/09/2015 04:03 PM, Gregory Farnum wrote:
On Mon, Nov 9, 2015 at 6:57 AM, Burkhard Linke
<Burkhard.Linke@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
Hi,

On 11/09/2015 02:07 PM, Burkhard Linke wrote:
Hi,
*snipsnap*


Cluster is running Hammer 0.94.5 on top of Ubuntu 14.04. Clients use
ceph-fuse with patches for improved page cache handling, but the problem
also occur with the official hammer packages from download.ceph.com
I've tested the same setup with clients running kernel 4.2.5 and using the
kernel cephfs client. I was not able to reproduce the problem in that setup.
What's the workload you're running, precisely? I would not generally
expect multiple accesses to a sqlite database to work *well*, but
offhand I'm not entirely certain why it would work differently between
the kernel and userspace clients. (Probably something to do with the
timing of the shared requests and any writes happening.)
Using SQLite on network filesystems is somewhat challenging, especially if multiple instances write to the database. The reproducible test case does not write to the database at all; it simply extracts the table structure from the default database. The applications itself only read from the database and do not modify anything. The underlying SQLite library may attempt to use locking to protect certain operations. According to dmesg the processes are blocked within fuse calls:

Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.543966] INFO: task ceph-fuse:6298 blocked for more than 120 seconds. Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544014] Not tainted 4.2.5-040205-generic #201510270124 Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544054] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544119] ceph-fuse D ffff881fbf8d64c0 0 6298 3262 0x00000100 Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544125] ffff881f9768f838 0000000000000086 ffff883fb2d83700 ffff881f97b38dc0 Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544130] 0000000000001000 ffff881f97690000 ffff881fbf8d64c0 7fffffffffffffff Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544134] 0000000000000002 ffffffff817dc300 ffff881f9768f858 ffffffff817dbb07
Nov  9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544138] Call Trace:
Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544147] [<ffffffff817dc300>] ? bit_wait+0x50/0x50 Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544156] [<ffffffff817deba9>] schedule_timeout+0x189/0x250 Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544166] [<ffffffff817dc300>] ? bit_wait+0x50/0x50 Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544176] [<ffffffff810bcb64>] ? prepare_to_wait_exclusive+0x54/0x80 Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544185] [<ffffffff817dc0bb>] __wait_on_bit_lock+0x4b/0xa0 Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544195] [<ffffffff810bd0e0>] ? autoremove_wake_function+0x40/0x40 Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544205] [<ffffffff8106d962>] ? get_user_pages_fast+0x112/0x190 Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544213] [<ffffffff812173df>] ? ilookup5_nowait+0x6f/0x90 Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544222] [<ffffffff812f922d>] fuse_notify+0x14d/0x830 Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544230] [<ffffffff812f85d4>] ? fuse_copy_do+0x84/0xf0 Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544239] [<ffffffff810a4f7d>] ? ttwu_do_activate.constprop.89+0x5d/0x70 Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544248] [<ffffffff811fc0dc>] do_iter_readv_writev+0x6c/0xa0 Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544257] [<ffffffff811bc9d8>] ? mprotect_fixup+0x148/0x230 Nov 9 14:17:08 hp-s2-r2-compute kernel: [ 1081.544264] [<ffffffff811fdae9>] SyS_writev+0x59/0xf0 Nov 9 14:19:08 hp-s2-r2-compute kernel: [ 1201.672548] Not tainted 4.2.5-040205-generic #201510270124 Nov 9 14:19:08 hp-s2-r2-compute kernel: [ 1201.672654] ceph-fuse D ffff881fbf8d64c0 0 6298 3262 0x00000100 Nov 9 14:19:08 hp-s2-r2-compute kernel: [ 1201.672665] 0000000000001000 ffff881f97690000 ffff881fbf8d64c0 7fffffffffffffff
Nov  9 14:19:08 hp-s2-r2-compute kernel: [ 1201.672673] Call Trace:
Nov 9 14:19:08 hp-s2-r2-compute kernel: [ 1201.672687] [<ffffffff817dbb07>] schedule+0x37/0x80 Nov 9 14:19:08 hp-s2-r2-compute kernel: [ 1201.672698] [<ffffffff8101dcd9>] ? read_tsc+0x9/0x10 Nov 9 14:19:08 hp-s2-r2-compute kernel: [ 1201.672707] [<ffffffff817db114>] io_schedule_timeout+0xa4/0x110 Nov 9 14:19:08 hp-s2-r2-compute kernel: [ 1201.672717] [<ffffffff817dc335>] bit_wait_io+0x35/0x50 Nov 9 14:19:08 hp-s2-r2-compute kernel: [ 1201.672726] [<ffffffff8118186b>] __lock_page+0xbb/0xe0 Nov 9 14:19:08 hp-s2-r2-compute kernel: [ 1201.672736] [<ffffffff811934cc>] invalidate_inode_pages2_range+0x22c/0x460 Nov 9 14:19:08 hp-s2-r2-compute kernel: [ 1201.672745] [<ffffffff81304a80>] ? fuse_init_file_inode+0x30/0x30 Nov 9 14:19:08 hp-s2-r2-compute kernel: [ 1201.672753] [<ffffffff813068a6>] fuse_reverse_inval_inode+0x66/0x90 Nov 9 14:19:08 hp-s2-r2-compute kernel: [ 1201.672761] [<ffffffff813c8e12>] ? iov_iter_get_pages+0xa2/0x220 Nov 9 14:19:08 hp-s2-r2-compute kernel: [ 1201.672770] [<ffffffff812f9f0d>] fuse_dev_do_write+0x22d/0x380 Nov 9 14:19:08 hp-s2-r2-compute kernel: [ 1201.672779] [<ffffffff812fa41b>] fuse_dev_write+0x5b/0x80 Nov 9 14:19:08 hp-s2-r2-compute kernel: [ 1201.672786] [<ffffffff811fcc66>] do_readv_writev+0x196/0x250 Nov 9 14:19:08 hp-s2-r2-compute kernel: [ 1201.672796] [<ffffffff811fcda9>] vfs_writev+0x39/0x50 Nov 9 14:19:08 hp-s2-r2-compute kernel: [ 1201.672803] [<ffffffff817dfb72>] entry_SYSCALL_64_fastpath+0x16/0x75


The fact that the kernel client is working so far may be timing related. I've also done test runs on the cluster with 20 instance of the application and a small dataset running in parallel without any problem so far.

Best regards,
Burkhard
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux