Re: Memory leak in 3.6.*?

Yannick Perret <yannick.perret@xxxxxxxxxxxxx> · Fri, 22 Jul 2016 22:16:38 +0200



    Le 22/07/2016 21:12, Yannick Perret a
      écrit :

    
    Le
      22/07/2016 17:47, Mykola Ulianytskyi a écrit :
      

      Hi
        

          3.7 clients are not compatible with
          3.6 servers
          

        Can you provide more info?
        

        I use some 3.7 clients with 3.6 servers and don't see issues.
        

      Well,
      

      with client 3.7.13 compiled on the same machine when I try the
      same mount I get:
      

      # mount -t glusterfs sto1.my.domain:BACKUP-ADMIN-DATA  /zog/
      

      Mount failed. Please check the log file for more details.
      

      Checking the logs (/var/log/glusterfs/zog.log) I have:
      

      [2016-07-22 19:05:40.249143] I [MSGID: 100030]
      [glusterfsd.c:2338:main] 0-/usr/local/sbin/glusterfs: Started
      running /usr/local/sbin/glusterfs version 3.7.13 (args:
      /usr/local/sbin/glusterfs --volfile-server=sto1.my.domain
      --volfile-id=BACKUP-ADMIN-DATA /zog)
      

      [2016-07-22 19:05:40.258437] I [MSGID: 101190]
      [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started
      thread with index 1
      

      [2016-07-22 19:05:40.259480] W [socket.c:701:__socket_rwv]
      0-glusterfs: readv on <the-IP>:24007 failed (Aucune donnée
      disponible)
      

      [2016-07-22 19:05:40.259859] E
      [rpc-clnt.c:362:saved_frames_unwind] (-->
      /usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x175)[0x7fad7d039335]
      (-->
      /usr/local/lib/libgfrpc.so.0(saved_frames_unwind+0x1b3)[0x7fad7ce04e73]
      (-->
      /usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fad7ce04f6e]
      (-->
      /usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e)[0x7fad7ce065ee]
      (-->
      /usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7fad7ce06de8]
      ))))) 0-glusterfs: forced unwinding frame type(GlusterFS
      Handshake) op(GETSPEC(2)) called at 2016-07-22 19:05:40.258858
      (xid=0x1)
      

      [2016-07-22 19:05:40.259894] E
      [glusterfsd-mgmt.c:1690:mgmt_getspec_cbk] 0-mgmt: failed to fetch
      volume file (key:BACKUP-ADMIN-DATA)
      

      [2016-07-22 19:05:40.259939] W
      [glusterfsd.c:1251:cleanup_and_exit]
      (-->/usr/local/lib/libgfrpc.so.0(saved_frames_unwind+0x1de)
      [0x7fad7ce04e9e]
      -->/usr/local/sbin/glusterfs(mgmt_getspec_cbk+0x454) [0x40d564]
      -->/usr/local/sbin/glusterfs(cleanup_and_exit+0x4b) [0x407eab]
      ) 0-: received signum (0), shutting down
      

      [2016-07-22 19:05:40.259965] I [fuse-bridge.c:5720:fini] 0-fuse:
      Unmounting '/zog'.
      

      [2016-07-22 19:05:40.260913] W
      [glusterfsd.c:1251:cleanup_and_exit]
      (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x80a4)
      [0x7fad7c0a30a4]
      -->/usr/local/sbin/glusterfs(glusterfs_sigwaiter+0xc5)
      [0x408015] -->/usr/local/sbin/glusterfs(cleanup_and_exit+0x4b)
      [0x407eab] ) 0-: received signum (15), shutting down
      

    Hmmm… I just saw that logs are (partly) translated which can be
    harder to understand for non-french speakers.

    "Aucune donnée disponible" means: no available data

    
    BTW If I could manage 3.7 clients to work with my servers and if the
    memory leak don't exists in 3.7 it would be fine for me.

    
    --

    Y.

    
    I
      did not go further about that as I just presumed that 3.7 series
      was not compatible with 3.6 servers but it's maybe something else.
      But here it is the same client, the same server(s) and the same
      volume.
      

      The compilation is with features (built with "configure
      --disable-tiering" as I don't have installed stuff for that):
      

      FUSE client          : yes
      

      Infiniband verbs     : no
      

      epoll IO multiplex   : yes
      

      argp-standalone      : no
      

      fusermount           : yes
      

      readline             : yes
      

      georeplication       : yes
      

      Linux-AIO            : no
      

      Enable Debug         : no
      

      Block Device xlator  : no
      

      glupy                : yes
      

      Use syslog           : yes
      

      XML output           : yes
      

      QEMU Block formats   : no
      

      Encryption xlator    : yes
      

      Unit Tests           : no
      

      POSIX ACLs           : yes
      

      Data Classification  : no
      

      firewalld-config     : no
      

      Regards,
      

      --
      

      Y.
      

      Thank you
        

        --
        

        With best regards,
        

        Mykola
        

        On Fri, Jul 22, 2016 at 4:31 PM, Yannick Perret
        

        <yannick.perret@xxxxxxxxxxxxx> wrote:
        

        Note: I'm have a dev client machine so I
          can perform tests or recompile
          

          glusterfs client if it can helps getting data about that.
          

          I did not test this problem against 3.7.x version as my 2
          servers are in use
          

          and I can't upgrade them at this time, and 3.7 clients are not
          compatible
          

          with 3.6 servers (as far as I can see from my tests).
          

          --
          

          Y.
          

          Le 22/07/2016 14:06, Yannick Perret a écrit :
          

          Hello,
          

          some times ago I posted about a memory leak in client process,
          but it was on
          

          a very old 32bit machine (both kernel and OS) and I don't
          found evidences
          

          about a similar problem on our recent machines.
          

          But I performed more tests and I have the same problem.
          

          Clients are 64bit Debian 8.2 machines. Glusterfs client on
          these machines is
          

          compiled from sources with activated stuff:
          

          FUSE client          : yes
          

          Infiniband verbs     : no
          

          epoll IO multiplex   : yes
          

          argp-standalone      : no
          

          fusermount           : yes
          

          readline             : yes
          

          georeplication       : yes
          

          Linux-AIO            : no
          

          Enable Debug         : no
          

          systemtap            : no
          

          Block Device xlator  : no
          

          glupy                : no
          

          Use syslog           : yes
          

          XML output           : yes
          

          QEMU Block formats   : no
          

          Encryption xlator    : yes
          

          Erasure Code xlator  : yes
          

          I tested both 3.6.7 and 3.6.9 version on client (3.6.7 is the
          one installed
          

          on our machines, even on servers, 3.6.9 is for testing with
          last 3.6
          

          version).
          

          Here are the operations on the client (also performed with
          similar results
          

          with 3.6.7 version):
          

          # /usr/local/sbin/glusterfs --version
          

          glusterfs 3.6.9 built on Jul 22 2016 13:27:42
          

          (…)
          

          # mount -t glusterfs sto1.my.domain:BACKUP-ADMIN-DATA  /zog/
          

          # cd /usr/
          

          # cp -Rp * /zog/TEMP/
          

          Then monitoring memory used by glusterfs process while 'cp' is
          running
          

          (resp. VSZ and RSS from 'ps'):
          

          284740 70232
          

          284740 70232
          

          284876 71704
          

          285000 72684
          

          285136 74008
          

          285416 75940
          

          (…)
          

          368684 151980
          

          369324 153768
          

          369836 155576
          

          370092 156192
          

          370092 156192
          

          Here both sizes are stable and correspond to the end of 'cp'
          command.
          

          If I restart an other 'cp' (even on the same directories) size
          starts again
          

          to increase.
          

          If I perform a 'ls -lR' in the directory size also increase:
          

          370756 192488
          

          389964 212148
          

          390948 213232
          

          (here I ^C the 'ls')
          

          When doing nothing the size don't increase but never decrease
          (calling
          

          'sync' don't change the situation).
          

          Sending a HUP signal to glusterfs process also increases
          memory (390948
          

          213324 → 456484 213320).
          

          Changing volume configuration (changing
          diagnostics.client-sys-log-level
          

          value) don't change anything.
          

          Here the actual ps:
          

          root     17041  4.9  5.2 456484 213320 ?       Ssl  13:29  
          1:21
          

          /usr/local/sbin/glusterfs --volfile-server=sto1.my.domain
          

          --volfile-id=BACKUP-ADMIN-DATA /zog
          

          Of course umouting/remounting fall back to "start" size:
          

          # umount /zog
          

          # mount -t glusterfs sto1.my.domain:BACKUP-ADMIN-DATA  /zog/
          

          → root     28741  0.3  0.7 273320 30484 ?        Ssl  13:57  
          0:00
          

          /usr/local/sbin/glusterfs --volfile-server=sto1.my.domain
          

          --volfile-id=BACKUP-ADMIN-DATA /zog
          

          I didn't saw this before because most of our volumes are
          mounted "on demand"
          

          for some storage activities or are permanently mounted but
          with very few
          

          activity.
          

          But clearly this memory usage driff is a long-term problem. On
          the old 32bit
          

          machine I had this problem ("solved" by using NFS mounts in
          order to wait
          

          for this old machine to be replaced) and it lead to glusterfs
          being killed
          

          by OS when out of free memory. It was faster than what I
          describe here but
          

          it's just a question of time.
          

          Thanks for any help about that.
          

          Regards,
          

          --
          

          Y.
          

          The corresponding volume on servers is (if it can help):
          

          Volume Name: BACKUP-ADMIN-DATA
          

          Type: Replicate
          

          Volume ID: 306d57f3-fb30-4bcc-8687-08bf0a3d7878
          

          Status: Started
          

          Number of Bricks: 1 x 2 = 2
          

          Transport-type: tcp
          

          Bricks:
          

          Brick1: sto1.my.domain:/glusterfs/backup-admin/data
          

          Brick2: sto2.my.domain:/glusterfs/backup-admin/data
          

          Options Reconfigured:
          

          diagnostics.client-sys-log-level: WARNING
          

          _______________________________________________
          

          Gluster-users mailing list
          

          Gluster-users@xxxxxxxxxxx
          

          http://www.gluster.org/mailman/listinfo/gluster-users
          

          _______________________________________________
          

          Gluster-users mailing list
          

          Gluster-users@xxxxxxxxxxx
          

          http://www.gluster.org/mailman/listinfo/gluster-users
          

      _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
    
    
Attachment:
smime.p7s

Description: Signature cryptographique S/MIME
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users