Re: cephfs kernel client instability

Andras Pataki <apataki@xxxxxxxxxxxxxxxxxxxxx> · Thu, 3 Jan 2019 06:49:22 -0500



    I wonder if anyone could offer any insight on the issue below,
    regarding the CentOS 7.6 kernel cephfs client connecting to a
    Luminous cluster.  I have since tried a much newer 4.19.13 kernel,
    which did not show the same issue (but unfortunately for various
    reasons unrelated to ceph, we can't go to such a new kernel).

    
    Am I reading it right that somehow the monitor thinks this kernel is
    old and needs to prepare special maps in some older format for it,
    and that takes too long and the kernel just gives up, or perhaps has
    some other communication protocol error?  It seems like one of these
    mon communication sessions only lasts half a second.  Then it
    reconnects to another mon, and gets the same result, etc.  Any way
    around this?

    
    Andras
    

    On 12/26/18 7:55 PM, Andras Pataki
      wrote:

    
      We've been using ceph-fuse with a pretty good stability record
      (against the Luminous 12.2.8 back end).  Unfortunately ceph-fuse
      has extremely poor small file performance (understandably), so
      we've been testing the kernel client.  The latest RedHat kernel
      3.10.0-957.1.3.el7.x86_64 seems to work pretty well, as long as
      the cluster is running in a completely clean state.  However, it
      seems that as soon as there is something happening to the cluster,
      the kernel client crashes pretty badly.

      
      Today's example: I've reweighted some OSDs to balance the disk
      usage a bit (set nobackfill, reweight the OSDs, check the new
      hypothetical space usage, then unset nobackfill).   As soon as the
      reweighting procedure started, the kernel client went into an
      infinite loop trying to unsuccessfully connect to mons:

      
      Dec 26 19:28:53 mon5 kernel: libceph: mon0 10.128.150.10:6789
        io error

      Dec 26 19:28:53 mon5 kernel: libceph: mon0 10.128.150.10:6789
        session lost, hunting for new mon

      Dec 26 19:28:53 mon5 kernel: libceph: mon2 10.128.150.12:6789
        session established

      Dec 26 19:28:58 mon5 kernel: libceph: mon2 10.128.150.12:6789
        io error

      Dec 26 19:28:58 mon5 kernel: libceph: mon2 10.128.150.12:6789
        session lost, hunting for new mon

      Dec 26 19:28:58 mon5 kernel: libceph: mon1 10.128.150.11:6789
        session established

      Dec 26 19:28:59 mon5 kernel: libceph: mon1 10.128.150.11:6789
        io error

      Dec 26 19:28:59 mon5 kernel: libceph: mon1 10.128.150.11:6789
        session lost, hunting for new mon

      Dec 26 19:28:59 mon5 kernel: libceph: mon2 10.128.150.12:6789
        session established

      Dec 26 19:28:59 mon5 kernel: libceph: mon2 10.128.150.12:6789
        io error

      Dec 26 19:28:59 mon5 kernel: libceph: mon2 10.128.150.12:6789
        session lost, hunting for new mon

      Dec 26 19:28:59 mon5 kernel: libceph: mon0 10.128.150.10:6789
        session established

      Dec 26 19:29:00 mon5 kernel: libceph: mon0 10.128.150.10:6789
        io error

      Dec 26 19:29:00 mon5 kernel: libceph: mon0 10.128.150.10:6789
        session lost, hunting for new mon

      Dec 26 19:29:00 mon5 kernel: libceph: mon2 10.128.150.12:6789
        session established

      Dec 26 19:29:00 mon5 kernel: libceph: mon2 10.128.150.12:6789
        io error

      Dec 26 19:29:00 mon5 kernel: libceph: mon2 10.128.150.12:6789
        session lost, hunting for new mon

      Dec 26 19:29:00 mon5 kernel: libceph: mon1 10.128.150.11:6789
        session established

      Dec 26 19:29:00 mon5 kernel: libceph: mon1 10.128.150.11:6789
        io error

      Dec 26 19:29:00 mon5 kernel: libceph: mon1 10.128.150.11:6789
        session lost, hunting for new mon

      Dec 26 19:29:00 mon5 kernel: libceph: mon2 10.128.150.12:6789
        session established

      Dec 26 19:29:01 mon5 kernel: libceph: mon2 10.128.150.12:6789
        io error

      Dec 26 19:29:01 mon5 kernel: libceph: mon2 10.128.150.12:6789
        session lost, hunting for new mon

      Dec 26 19:29:01 mon5 kernel: libceph: mon0 10.128.150.10:6789
        session established

      Dec 26 19:29:01 mon5 kernel: libceph: mon0 10.128.150.10:6789
        io error

      Dec 26 19:29:01 mon5 kernel: libceph: mon0 10.128.150.10:6789
        session lost, hunting for new mon

      Dec 26 19:29:01 mon5 kernel: libceph: mon2 10.128.150.12:6789
        session established

      Dec 26 19:29:02 mon5 kernel: libceph: mon2 10.128.150.12:6789
        io error

      Dec 26 19:29:02 mon5 kernel: libceph: mon2 10.128.150.12:6789
        session lost, hunting for new mon

      Dec 26 19:29:02 mon5 kernel: libceph: mon1 10.128.150.11:6789
        session established

      Dec 26 19:29:02 mon5 kernel: libceph: mon1 10.128.150.11:6789
        io error

      Dec 26 19:29:02 mon5 kernel: libceph: mon1 10.128.150.11:6789
        session lost, hunting for new mon

      ... etc ...

      
      seemingly never recovering.  The cluster is healthy, all other
      clients are successfully doing I/O:

      [root@cephmon00 ceph]# ceph -s

          cluster:

            id:     d7b33135-0940-4e48-8aa6-1d2026597c2f

            health: HEALTH_WARN

                    noout flag(s) set

                    1 backfillfull osd(s)

                    4 pool(s) backfillfull

                    239119058/12419244975 objects misplaced (1.925%)

         
          services:

            mon: 3 daemons, quorum cephmon00,cephmon01,cephmon02

            mgr: cephmon00(active)

            mds: cephfs-1/1/1 up  {0=cephmds00=up:active}, 1
          up:standby

            osd: 3534 osds: 3534 up, 3534 in; 5040 remapped pgs

                 flags noout

         
          data:

            pools:   5 pools, 50688 pgs

            objects: 2.51G objects, 7.77PiB

            usage:   19.1PiB used, 5.90PiB / 25.0PiB avail

            pgs:     239119058/12419244975 objects misplaced
          (1.925%)

                     45639 active+clean

                     2914  active+remapped+backfilling

                     2126  active+remapped+backfill_wait

                     9     active+clean+scrubbing+deep

         
          io:

            client:   10.3MiB/s rd, 2.23GiB/s wr, 32op/s rd,
          3.77kop/s wr

            recovery: 114GiB/s, 33.12kobjects/s

      
      The client machine in question is otherwise healthy also (not out
      of memory, etc.).  I've checked the osd blacklist, nothing on
      that.  Nothing suspicious in the mon logs regarding this client
      either.  Setting one of the mons to log level 20, the relevant
      section when this client tries to connect:

      
      2018-12-26 19:51:46.203264 7fffeb406700 10
        mon.cephmon00@0(leader) e40 ms_verify_authorizer
        10.128.36.18:0/3882984371 client protocol 0

      2018-12-26 19:51:46.206645 7fffee40c700 10
        mon.cephmon00@0(leader) e40 _ms_dispatch new session
        0x5555722bb980 MonSession(client.36398604
        10.128.36.18:0/3882984371 is open , features 0x27018fb86aa42ada
        (jewel)) features 0x27018fb86aa42ada

      2018-12-26 19:51:46.206651 7fffee40c700 20
        mon.cephmon00@0(leader) e40  caps 

      2018-12-26 19:51:46.206657 7fffee40c700 10
        mon.cephmon00@0(leader).auth v57602 preprocess_query auth(proto
        0 34 bytes epoch 0) from client.36398604
        10.128.36.18:0/3882984371

      2018-12-26 19:51:46.206660 7fffee40c700 10
        mon.cephmon00@0(leader).auth v57602 prep_auth() blob_size=34

      2018-12-26 19:51:46.206692 7fffee40c700  2
        mon.cephmon00@0(leader) e40 send_reply 0x555562fee5a0
        0x55556f398b40 auth_reply(proto 2 0 (0) Success) v1

      2018-12-26 19:51:46.206867 7fffee40c700 20
        mon.cephmon00@0(leader) e40 _ms_dispatch existing session
        0x5555722bb980 for client.36398604 10.128.36.18:0/3882984371

      2018-12-26 19:51:46.206870 7fffee40c700 20
        mon.cephmon00@0(leader) e40  caps 

      2018-12-26 19:51:46.206877 7fffee40c700 10
        mon.cephmon00@0(leader).auth v57602 preprocess_query auth(proto
        2 128 bytes epoch 0) from client.36398604
        10.128.36.18:0/3882984371

      2018-12-26 19:51:46.206882 7fffee40c700 10
        mon.cephmon00@0(leader).auth v57602 prep_auth() blob_size=128

      2018-12-26 19:51:46.206979 7fffee40c700  2
        mon.cephmon00@0(leader) e40 send_reply 0x555562fee5a0
        0x55571f9c4d80 auth_reply(proto 2 0 (0) Success) v1

      2018-12-26 19:51:46.207178 7fffee40c700 20
        mon.cephmon00@0(leader) e40 _ms_dispatch existing session
        0x5555722bb980 for client.36398604 10.128.36.18:0/3882984371

      2018-12-26 19:51:46.207182 7fffee40c700 20
        mon.cephmon00@0(leader) e40  caps allow *

      2018-12-26 19:51:46.207184 7fffee40c700 20 is_capable
        service=mon command= read on cap allow *

      2018-12-26 19:51:46.207185 7fffee40c700 20  allow so far
        , doing grant allow *

      2018-12-26 19:51:46.207186 7fffee40c700 20  allow all

      2018-12-26 19:51:46.207188 7fffee40c700 10
        mon.cephmon00@0(leader) e40 handle_subscribe
        mon_subscribe({mdsmap=72103+,monmap=41+,osdmap=1207891}) v2

      2018-12-26 19:51:46.207194 7fffee40c700 10
        mon.cephmon00@0(leader) e40 handle_subscribe: MDS sub 'mdsmap'

      2018-12-26 19:51:46.207195 7fffee40c700 20 is_capable
        service=mds command= read on cap allow *

      2018-12-26 19:51:46.207197 7fffee40c700 20  allow so far
        , doing grant allow *

      2018-12-26 19:51:46.207198 7fffee40c700 20  allow all

      2018-12-26 19:51:46.207200 7fffee40c700 20
        mon.cephmon00@0(leader).mds e72102 check_sub: mdsmap

      2018-12-26 19:51:46.207204 7fffee40c700 10
        mon.cephmon00@0(leader).monmap v40 check_sub monmap next 41 have
        40

      2018-12-26 19:51:46.207207 7fffee40c700 20 is_capable
        service=osd command= read on cap allow *

      2018-12-26 19:51:46.207208 7fffee40c700 20  allow so far
        , doing grant allow *

      2018-12-26 19:51:46.207209 7fffee40c700 20  allow all

      2018-12-26 19:51:46.207211 7fffee40c700 10
        mon.cephmon00@0(leader).osd e1208017 check_osdmap_sub
        0x55592f513d00 next 1207891 (onetime)

      2018-12-26 19:51:46.207213 7fffee40c700  5
        mon.cephmon00@0(leader).osd e1208017 send_incremental
        [1207891..1208017] to client.36398604 10.128.36.18:0/3882984371

      2018-12-26 19:51:46.207217 7fffee40c700 10
        mon.cephmon00@0(leader).osd e1208017 build_incremental
        [1207891..1207930] with features 27018fb86aa42ada

      2018-12-26 19:51:46.220019 7fffee40c700 20
        mon.cephmon00@0(leader).osd e1208017 reencode_incremental_map
        1207930 with features 504412504116439552

      2018-12-26 19:51:46.230217 7fffee40c700 20
        mon.cephmon00@0(leader).osd e1208017 build_incremental    inc
        1207930 1146701 bytes

      2018-12-26 19:51:46.230349 7fffee40c700 20
        mon.cephmon00@0(leader).osd e1208017 reencode_incremental_map
        1207929 with features 504412504116439552

      2018-12-26 19:51:46.232523 7fffee40c700 20
        mon.cephmon00@0(leader).osd e1208017 build_incremental    inc
        1207929 175613 bytes

      ... a lot more of reencode_incremental stuff ...

      2018-12-26 19:51:46.745394 7fffee40c700 10
        mon.cephmon00@0(leader) e40 ms_handle_reset 0x5555637cf800
        10.128.36.18:0/3882984371

      2018-12-26 19:51:46.745395 7ffff0c11700 10
        mon.cephmon00@0(leader).log v79246823 encode_full log v 79246823

      2018-12-26 19:51:46.745469 7ffff0c11700 10
        mon.cephmon00@0(leader).log v79246823 encode_pending v79246824

      2018-12-26 19:51:46.745763 7fffee40c700 10
        mon.cephmon00@0(leader) e40 reset/close on session
        client.36398604 10.128.36.18:0/3882984371

      2018-12-26 19:51:46.745769 7fffee40c700 10
        mon.cephmon00@0(leader) e40 remove_session 0x5555722bb980
        client.36398604 10.128.36.18:0/3882984371 features
        0x27018fb86aa42ada

      
      Any pointers to what to do here?

      
      Andras

      
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com