I've tested the 12.2.1 fuse client - and it also reproduces the
problem unfortunately. Investigating the code that accesses the
file system, it looks like multiple processes from multiple nodes
write to the same file concurrently, but to different byte ranges of
it. Unfortunately the problem happens some hours into the run of
the code, so I can't really run the MDS or fuse with a very high
debug level that long. Well, perhaps fuse I could run with a higher
debug level on the nodes in question if that helps.
Andras
On 11/03/2017 12:29 AM, Gregory Farnum
wrote:
Either ought to work fine.
I'm planning to test
the newer ceph-fuse tomorrow. Would it be better to stay
with the Jewel 10.2.10 client, or would the 12.2.1
Luminous client be better (even though the back-end is
Jewel for now)?
Andras
On
11/02/2017 05:54 PM, Gregory Farnum wrote:
Have you tested on the new ceph-fuse? This does
sound vaguely familiar and is an issue I'd generally
expect to have the fix backported for, once it was
identified.
We've
been running into a strange problem with Ceph using
ceph-fuse and
the filesystem. All the back end nodes are on
10.2.10, the fuse clients
are on 10.2.7.
After some hours of runs, some processes get stuck
waiting for fuse like:
[root@worker1144 ~]# cat /proc/58193/stack
[<ffffffffa08cd241>]
wait_answer_interruptible+0x91/0xe0 [fuse]
[<ffffffffa08cd653>]
__fuse_request_send+0x253/0x2c0 [fuse]
[<ffffffffa08cd6d2>]
fuse_request_send+0x12/0x20 [fuse]
[<ffffffffa08d69d6>]
fuse_send_write+0xd6/0x110 [fuse]
[<ffffffffa08d84d5>]
fuse_perform_write+0x2f5/0x5a0 [fuse]
[<ffffffffa08d8a21>]
fuse_file_aio_write+0x2a1/0x340 [fuse]
[<ffffffff811fdfbd>] do_sync_write+0x8d/0xd0
[<ffffffff811fe82d>] vfs_write+0xbd/0x1e0
[<ffffffff811ff34f>] SyS_write+0x7f/0xe0
[<ffffffff816975c9>]
system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
The cluster is healthy (all OSDs up, no slow
requests, etc.). More
details of my investigation efforts are in the bug
report I just submitted:
http://tracker.ceph.com/issues/22008
It looks like the fuse client is asking for some
caps that it never
thinks it receives from the MDS, so the thread
waiting for those caps on
behalf of the writing client never wakes up. The
restart of the MDS
fixes the problem (since ceph-fuse re-negotiates
caps).
Any ideas/suggestions?
Andras
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
|
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com