On Mon, Mar 13, 2017 at 8:15 PM, Andras Pataki <apataki@xxxxxxxxxxxxxxxxxxxxx> wrote: > Dear Cephers, > > We're using the ceph file system with the fuse client, and lately some of > our processes are getting stuck seemingly waiting for fuse operations. At > the same time, the cluster is healthy, no slow requests, all OSDs up and > running, and both the MDS and the fuse client think that there are no > pending operations. The situation is semi-reproducible. When I run a > various cluster jobs, some get stuck after a few hours of correct operation. > The cluster is on ceph 10.2.5 and 10.2.6, the fuse clients are 10.2.6, but I > have tried 10.2.5 and 10.2.3, all of which have the same issue. This is on > CentOS (7.2 for the clients, 7.3 for the MDS/OSDs). > > Here are some details: > > The node with the stuck processes: > > [root@worker1070 ~]# ps -auxwww | grep 30519 > apataki 30519 39.8 0.9 8728064 5257588 ? Dl 12:11 60:50 ./Arepo > param.txt 2 6 > [root@worker1070 ~]# cat /proc/30519/stack > [<ffffffffa0a1d7bb>] fuse_file_aio_write+0xbb/0x340 [fuse] > [<ffffffff811ddd3d>] do_sync_write+0x8d/0xd0 > [<ffffffff811de55d>] vfs_write+0xbd/0x1e0 > [<ffffffff811defff>] SyS_write+0x7f/0xe0 > [<ffffffff816458c9>] system_call_fastpath+0x16/0x1b > [<ffffffffffffffff>] 0xffffffffffffffff > > [root@worker1070 ~]# ps -auxwww | grep 30533 > apataki 30533 39.8 0.9 8795316 5261308 ? Sl 12:11 60:55 ./Arepo > param.txt 2 6 > [root@worker1070 ~]# cat /proc/30533/stack > [<ffffffffa0a12241>] wait_answer_interruptible+0x91/0xe0 [fuse] > [<ffffffffa0a12653>] __fuse_request_send+0x253/0x2c0 [fuse] > [<ffffffffa0a126d2>] fuse_request_send+0x12/0x20 [fuse] > [<ffffffffa0a1b966>] fuse_send_write+0xd6/0x110 [fuse] > [<ffffffffa0a1d45d>] fuse_perform_write+0x2ed/0x590 [fuse] > [<ffffffffa0a1d9a1>] fuse_file_aio_write+0x2a1/0x340 [fuse] > [<ffffffff811ddd3d>] do_sync_write+0x8d/0xd0 > [<ffffffff811de55d>] vfs_write+0xbd/0x1e0 > [<ffffffff811defff>] SyS_write+0x7f/0xe0 > [<ffffffff816458c9>] system_call_fastpath+0x16/0x1b > [<ffffffffffffffff>] 0xffffffffffffffff > > Presumably the second process is waiting on the first holding some lock ... > > The fuse client on the node: > > [root@worker1070 ~]# ceph daemon /var/run/ceph/ceph-client.admin.asok status > { > "metadata": { > "ceph_sha1": "656b5b63ed7c43bd014bcafd81b001959d5f089f", > "ceph_version": "ceph version 10.2.6 > (656b5b63ed7c43bd014bcafd81b001959d5f089f)", > "entity_id": "admin", > "hostname": "worker1070", > "mount_point": "\/mnt\/ceph", > "root": "\/" > }, > "dentry_count": 40, > "dentry_pinned_count": 23, > "inode_count": 123, > "mds_epoch": 19041, > "osd_epoch": 462327, > "osd_epoch_barrier": 462326 > } > > [root@worker1070 ~]# ceph daemon /var/run/ceph/ceph-client.admin.asok > mds_sessions > { > "id": 3616543, > "sessions": [ > { > "mds": 0, > "addr": "10.128.128.110:6800\/909443124", > "seq": 338, > "cap_gen": 0, > "cap_ttl": "2017-03-13 14:47:37.575229", > "last_cap_renew_request": "2017-03-13 14:46:37.575229", > "cap_renew_seq": 12694, > "num_caps": 713, > "state": "open" > } > ], > "mdsmap_epoch": 19041 > } > > [root@worker1070 ~]# ceph daemon /var/run/ceph/ceph-client.admin.asok > mds_requests > {} > > > The overall cluster health and the MDS: > > [root@cephosd000 ~]# ceph -s > cluster d7b33135-0940-4e48-8aa6-1d2026597c2f > health HEALTH_WARN > noscrub,nodeep-scrub,require_jewel_osds flag(s) set > monmap e17: 3 mons at > {hyperv029=10.4.36.179:6789/0,hyperv030=10.4.36.180:6789/0,hyperv031=10.4.36.181:6789/0} > election epoch 29148, quorum 0,1,2 hyperv029,hyperv030,hyperv031 > fsmap e19041: 1/1/1 up {0=cephosd000=up:active} > osdmap e462328: 624 osds: 624 up, 624 in > flags noscrub,nodeep-scrub,require_jewel_osds > pgmap v44458747: 42496 pgs, 6 pools, 924 TB data, 272 Mobjects > 2154 TB used, 1791 TB / 3946 TB avail > 42496 active+clean > client io 86911 kB/s rd, 556 MB/s wr, 227 op/s rd, 303 op/s wr > > [root@cephosd000 ~]# ceph daemon /var/run/ceph/ceph-mds.cephosd000.asok ops > { > "ops": [], > "num_ops": 0 > } > > > The odd thing is that if in this state I restart the MDS, the client process > wakes up and proceeds with its work without any errors. As if a request was > lost and somehow retransmitted/restarted when the MDS got restarted and the > fuse layer reconnected to it. Interesting. A couple of ideas for more debugging: * Next time you go through this process of restarting the MDS while there is a stuck client, first increase the client's logging (ceph daemon <path to /var/run/ceph/ceph-<id>.asok> config set debug_client 20"). Then we should get a clear sense of exactly what's happening on the MDS restart that's enabling the client to proceed. * When inspecting the client's "mds_sessions" output, also check the "session ls" output on the MDS side to make sure the MDS and client both agree that it has an open session. John > > When I try to attach a gdb session to either of the client processes, gdb > just hangs. However, right after the MDS restart gdb attaches to the > process successfully, and shows that the getting stuck happened on closing > of a file. In fact, it looks like both processes were trying to write to > the same file opened with fopen("filename", "a") and close it: > > (gdb) where > #0 0x00002aaaadc53abd in write () from /lib64/libc.so.6 > #1 0x00002aaaadbe2383 in _IO_new_file_write () from /lib64/libc.so.6 > #2 0x00002aaaadbe37ec in __GI__IO_do_write () from /lib64/libc.so.6 > #3 0x00002aaaadbe30e0 in __GI__IO_file_close_it () from /lib64/libc.so.6 > #4 0x00002aaaadbd7020 in fclose@@GLIBC_2.2.5 () from /lib64/libc.so.6 > ... > > It seems like the fuse client wasn't handling this case well, when two > processes write to the same file and close it perhaps? This is just a > speculation. Any ideas on how to proceed? Is there perhaps a known issue > related to this? > > Thanks, > > Andras > apataki@xxxxxxxxxxxxxxxxxxxxx > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com