Re: Many logs (errors?) on client → memory problem

Yannick Perret <yannick.perret@xxxxxxxxxxxxx> · Fri, 10 Jun 2016 11:11:40 +0200



    I get no feedback on that but I think I
      found the problem:

      the glusterfs client grows on memory until no memory available and
      them it crashes.

      
      I performed the same operations on an other machine without being
      able to reproduce the problem.

      The machine with the problem is an old machine (debian, 3.2.50
      kernel, 32bit), whereas the other machine is an up-to-date debian
      64bit.

      
      To give some stats the glusterfs on the client starts with less
      than 810220 of resident size and finished with 3055336 (3Go!) when
      it crashes again. The volume was mounted only on this machine,
      used by only one process (a 'cp -Rp').

      
      Running the same from a recent machine gives far more stable
      memory usage (43364 of resident size and few and small
      increasing).

      Of course I'm using the same glusterfs version (compiled from
      sources on both machines).

      
      As I can't upgrade this old machine due to version compatibility
      with old softs − at least until we replace these old softs − I
      will so use a NFS mountpoint from the gluster servers.

      
      Whatever I still get on the recent machine very verbose logs for
      each directory creation:

      [2016-06-10 08:35:12.965438] I
      [dht-selfheal.c:1065:dht_selfheal_layout_new_directory]
      0-HOME-LIRIS-dht: chunk size = 0xffffffff / 2064114 = 0x820

      [2016-06-10 08:35:12.965473] I
      [dht-selfheal.c:1103:dht_selfheal_layout_new_directory]
      0-HOME-LIRIS-dht: assigning range size 0xffe76e40 to
      HOME-LIRIS-replicate-0

      [2016-06-10 08:35:12.966987] I [MSGID: 109036]
      [dht-common.c:6296:dht_log_new_layout_for_dir_selfheal]
      0-HOME-LIRIS-dht: Setting layout of /log_apache_error with
      [Subvol_name: HOME-LIRIS-replicate-0, Err: -1 , Start: 0 , Stop:
      4294967295 ], 

      
      I switched clients to WARNING log level (gluster volume set
      HOME-LIRIS diagnostics.client-sys-log-level WARNING) which is fine
      for me.

      But maybe WARNING should be the default log level, at least for
      clients, no? In production getting 3 lines per created directory
      is useless, and anyone who wants to analyze a problem will switch
      to INFO or DEBUG.

      
      Regards,

      --

      Y.

      
      Le 08/06/2016 17:35, Yannick Perret a écrit :

    
    Hello,
      

      I have a replica 2 volume managed on 2 identical server, using
      3.6.7 version of gluster. Here is the volume info:
      

      Volume Name: HOME-LIRIS
      

      Type: Replicate
      

      Volume ID: 47b4b856-371b-4b8c-8baa-2b7c32d7bb23
      

      Status: Started
      

      Number of Bricks: 1 x 2 = 2
      

      Transport-type: tcp
      

      Bricks:
      

      Brick1: sto1.mydomain:/glusterfs/home-liris/data
      

      Brick2: sto2.mydomain:/glusterfs/home-liris/data
      

      It is mounted on a (single) client with mount -t glusterfs
      sto1.mydomain:/HOME-LIRIS /futur-home/
      

      I started to copy a directory (~550Go, ~660 directories with many
      files) into it. Copy was done using 'cp -Rp'.
      

      It seems to work fine but I get *many* log entries in the
      corresponding mountpoint logs:
      

      [2016-06-07 14:01:27.587300] I
      [dht-selfheal.c:1065:dht_selfheal_layout_new_directory]
      0-HOME-LIRIS-dht: chunk size = 0xffffffff / 2064114 = 0x820
      

      [2016-06-07 14:01:27.587338] I
      [dht-selfheal.c:1103:dht_selfheal_layout_new_directory]
      0-HOME-LIRIS-dht: assigning range size 0xffe76e40 to
      HOME-LIRIS-replicate-0
      

      [2016-06-07 14:01:27.588436] I [MSGID: 109036]
      [dht-common.c:6296:dht_log_new_layout_for_dir_selfheal]
      0-HOME-LIRIS-dht: Setting layout of /olfamine with [Subvol_name:
      HOME-LIRIS-replicate-0, Err: -1 , Start: 0 , Stop: 4294967295 ],
      

      This is repeated for many files (124088 exactly). Is it normal? If
      yes I use default settings on the client so I find it a little bit
      verbose. If no can someone tell me what is the problem here?
      

      Moreover at the end of the log file I have:
      

      [2016-06-08 04:42:58.210617] A [MSGID: 0]
      [mem-pool.c:110:__gf_calloc] : no memory available for size
      (14651) [call stack follows]
      

      [2016-06-08 04:42:58.219060] A [MSGID: 0]
      [mem-pool.c:134:__gf_malloc] : no memory available for size
      (21026) [call stack follows]
      

      pending frames:
      

      frame : type(1) op(CREATE)
      

      frame : type(1) op(CREATE)
      

      frame : type(1) op(LOOKUP)
      

      frame : type(0) op(0)
      

      patchset: git://git.gluster.com/glusterfs.git
      

      signal received: 11
      

      time of crash:
      

      2016-06-08 04:42:58
      

      configuration details:
      

      argp 1
      

      backtrace 1
      

      dlfcn 1
      

      libpthread 1
      

      llistxattr 1
      

      setfsid 1
      

      spinlock 1
      

      epoll.h 1
      

      xattr.h 1
      

      st_atim.tv_nsec 1
      

      package-string: glusterfs 3.6.7
      

      Which clearly don't seems right.
      

      The data were not all copied (logs of copy got a logical list of
      "final transport node not connected" (or similar, it was
      translated in my language)).
      

      I re-mounted the volume and created a directory with 'mkdir TOTO'
      and get a similar:
      

      [2016-06-08 15:32:23.692936] I
      [dht-selfheal.c:1065:dht_selfheal_layout_new_directory]
      0-HOME-LIRIS-dht: chunk size = 0xffffffff / 2064114 = 0x820
      

      [2016-06-08 15:32:23.692982] I
      [dht-selfheal.c:1103:dht_selfheal_layout_new_directory]
      0-HOME-LIRIS-dht: assigning range size 0xffe76e40 to
      HOME-LIRIS-replicate-0
      

      [2016-06-08 15:32:23.694144] I [MSGID: 109036]
      [dht-common.c:6296:dht_log_new_layout_for_dir_selfheal]
      0-HOME-LIRIS-dht: Setting layout of /TOTO with [Subvol_name:
      HOME-LIRIS-replicate-0, Err: -1 , Start: 0 , Stop: 4294967295 ],
      

      but I don't get such message with files.
      

      If it can help volumes are ~2To and content is far from that, and
      both bricks are ext4 (both same size).
      

      Any help would be appreciated.
      

      Regards,
      

      --
      

      Y.
      

      _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
    
    
Attachment:
smime.p7s

Description: Signature cryptographique S/MIME
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users