CephFS fuse client users stuck

Andras Pataki <apataki@xxxxxxxxxxxxxxxxxxxxx> · Mon, 13 Mar 2017 16:15:27 -0400



    Dear Cephers,

    
    We're using the ceph file system with the fuse client, and lately
    some of our processes are getting stuck seemingly waiting for fuse
    operations.  At the same time, the cluster is healthy, no slow
    requests, all OSDs up and running, and both the MDS and the fuse
    client think that there are no pending operations.  The situation is
    semi-reproducible.  When I run a various cluster jobs, some get
    stuck after a few hours of correct operation.  The cluster is on
    ceph 10.2.5 and 10.2.6, the fuse clients are 10.2.6, but I have
    tried 10.2.5 and 10.2.3, all of which have the same issue.  This is
    on CentOS (7.2 for the clients, 7.3 for the MDS/OSDs).

    
    Here are some details:

    
    The node with the stuck processes:

    [root@worker1070 ~]# ps -auxwww | grep 30519

      apataki   30519 39.8  0.9 8728064 5257588 ?     Dl  
          12:11  60:50 ./Arepo param.txt 2 6

      [root@worker1070 ~]# cat /proc/30519/stack

      [<ffffffffa0a1d7bb>]
          fuse_file_aio_write+0xbb/0x340 [fuse]

      [<ffffffff811ddd3d>] do_sync_write+0x8d/0xd0

      [<ffffffff811de55d>] vfs_write+0xbd/0x1e0

      [<ffffffff811defff>] SyS_write+0x7f/0xe0

      [<ffffffff816458c9>] system_call_fastpath+0x16/0x1b

      [<ffffffffffffffff>] 0xffffffffffffffff

      
      [root@worker1070 ~]# ps -auxwww | grep 30533

      
          apataki   30533 39.8  0.9 8795316 5261308 ?     Sl   12:11 
          60:55 ./Arepo param.txt 2 6

      [root@worker1070 ~]# cat /proc/30533/stack

      [<ffffffffa0a12241>]
          wait_answer_interruptible+0x91/0xe0 [fuse]

        [<ffffffffa0a12653>]
          __fuse_request_send+0x253/0x2c0 [fuse]

        [<ffffffffa0a126d2>]
          fuse_request_send+0x12/0x20 [fuse]

        [<ffffffffa0a1b966>]
          fuse_send_write+0xd6/0x110 [fuse]

        [<ffffffffa0a1d45d>]
          fuse_perform_write+0x2ed/0x590 [fuse]

        [<ffffffffa0a1d9a1>]
          fuse_file_aio_write+0x2a1/0x340 [fuse]

        [<ffffffff811ddd3d>]
          do_sync_write+0x8d/0xd0

      [<ffffffff811de55d>] vfs_write+0xbd/0x1e0

      [<ffffffff811defff>] SyS_write+0x7f/0xe0

      [<ffffffff816458c9>] system_call_fastpath+0x16/0x1b

      [<ffffffffffffffff>] 0xffffffffffffffff

      
    Presumably the second process is waiting on the first holding some
    lock ...

    
    The fuse client on the node:

    [root@worker1070 ~]# ceph daemon
        /var/run/ceph/ceph-client.admin.asok status

      {

          "metadata": {

              "ceph_sha1":
        "656b5b63ed7c43bd014bcafd81b001959d5f089f",

              "ceph_version": "ceph version 10.2.6
        (656b5b63ed7c43bd014bcafd81b001959d5f089f)",

              "entity_id": "admin",

              "hostname": "worker1070",

              "mount_point": "\/mnt\/ceph",

              "root": "\/"

          },

          "dentry_count": 40,

          "dentry_pinned_count": 23,

          "inode_count": 123,

          "mds_epoch": 19041,

          "osd_epoch": 462327,

          "osd_epoch_barrier": 462326

      }

      
      [root@worker1070 ~]# ceph daemon
        /var/run/ceph/ceph-client.admin.asok mds_sessions

      {

          "id": 3616543,

          "sessions": [

              {

                  "mds": 0,

                  "addr": "10.128.128.110:6800\/909443124",

                  "seq": 338,

                  "cap_gen": 0,

                  "cap_ttl": "2017-03-13 14:47:37.575229",

                  "last_cap_renew_request": "2017-03-13
        14:46:37.575229",

                  "cap_renew_seq": 12694,

                  "num_caps": 713,

                  "state": "open"

              }

          ],

          "mdsmap_epoch": 19041

      }

      
      [root@worker1070 ~]# ceph daemon
        /var/run/ceph/ceph-client.admin.asok mds_requests

      {}

      
    The overall cluster health and the MDS:

    
    [root@cephosd000 ~]# ceph -s

          cluster d7b33135-0940-4e48-8aa6-1d2026597c2f

           health HEALTH_WARN

                  noscrub,nodeep-scrub,require_jewel_osds flag(s)
        set

           monmap e17: 3 mons at
{hyperv029=10.4.36.179:6789/0,hyperv030=10.4.36.180:6789/0,hyperv031=10.4.36.181:6789/0}

                  election epoch 29148, quorum 0,1,2
        hyperv029,hyperv030,hyperv031

            fsmap e19041: 1/1/1 up {0=cephosd000=up:active}

           osdmap e462328: 624 osds: 624 up, 624 in

                  flags noscrub,nodeep-scrub,require_jewel_osds

            pgmap v44458747: 42496 pgs, 6 pools, 924 TB data, 272
        Mobjects

                  2154 TB used, 1791 TB / 3946 TB avail

                     42496 active+clean

        client io 86911 kB/s rd, 556 MB/s wr, 227 op/s rd, 303 op/s
        wr

      
      [root@cephosd000 ~]# ceph daemon
        /var/run/ceph/ceph-mds.cephosd000.asok ops

      {

          "ops": [],

          "num_ops": 0

      }

    
    The odd thing is that if in this state I restart the MDS, the client
    process wakes up and proceeds with its work without any errors.  As
    if a request was lost and somehow retransmitted/restarted when the
    MDS got restarted and the fuse layer reconnected to it.

    
    When I try to attach a gdb session to either of the client
    processes, gdb just hangs.  However, right after the MDS restart gdb
    attaches to the process successfully, and shows that the getting
    stuck happened on closing of a file.  In fact, it looks like both
    processes were trying to write to the same file opened with
    fopen("filename", "a") and close it:

    
    (gdb) where

      #0  0x00002aaaadc53abd in write () from /lib64/libc.so.6

      #1  0x00002aaaadbe2383 in _IO_new_file_write () from
        /lib64/libc.so.6

      #2  0x00002aaaadbe37ec in __GI__IO_do_write () from
        /lib64/libc.so.6

      #3  0x00002aaaadbe30e0 in __GI__IO_file_close_it () from
        /lib64/libc.so.6

      #4  0x00002aaaadbd7020 in fclose@@GLIBC_2.2.5 () from
        /lib64/libc.so.6

      ...

      
    It seems like the fuse client wasn't handling this case well, when
    two processes write to the same file and close it perhaps?  This is
    just a speculation.  Any ideas on how to proceed?  Is there perhaps
    a known issue related to this?

    
    Thanks,

    
    Andras

    apataki@xxxxxxxxxxxxxxxxxxxxx

    
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com