Re: Many logs (errors?) on client → memory problem

Mohammed Rafi K C <rkavunga@xxxxxxxxxx> · Fri, 10 Jun 2016 16:38:54 +0530



    On 06/10/2016 02:41 PM, Yannick Perret
      wrote:

    
      I get no feedback on that but I think
        I found the problem:

        the glusterfs client grows on memory until no memory available
        and them it crashes.

      
    If we you can take a statedump (kill -SIGUSR1 $client_pid) and send
    it across, I can take a look to see where it consumes so many
    memory. Since you said it is not reproducible with latest Debian
    system and If it is not important, that is fine for me.

    
        I performed the same operations on an other machine without
        being able to reproduce the problem.

        The machine with the problem is an old machine (debian, 3.2.50
        kernel, 32bit), whereas the other machine is an up-to-date
        debian 64bit.

        
        To give some stats the glusterfs on the client starts with less
        than 810220 of resident size and finished with 3055336 (3Go!)
        when it crashes again. The volume was mounted only on this
        machine, used by only one process (a 'cp -Rp').

        
        Running the same from a recent machine gives far more stable
        memory usage (43364 of resident size and few and small
        increasing).

        Of course I'm using the same glusterfs version (compiled from
        sources on both machines).

        
        As I can't upgrade this old machine due to version compatibility
        with old softs − at least until we replace these old softs − I
        will so use a NFS mountpoint from the gluster servers.

        
        Whatever I still get on the recent machine very verbose logs for
        each directory creation:

        [2016-06-10 08:35:12.965438] I
        [dht-selfheal.c:1065:dht_selfheal_layout_new_directory]
        0-HOME-LIRIS-dht: chunk size = 0xffffffff / 2064114 = 0x820

        [2016-06-10 08:35:12.965473] I
        [dht-selfheal.c:1103:dht_selfheal_layout_new_directory]
        0-HOME-LIRIS-dht: assigning range size 0xffe76e40 to
        HOME-LIRIS-replicate-0

        [2016-06-10 08:35:12.966987] I [MSGID: 109036]
        [dht-common.c:6296:dht_log_new_layout_for_dir_selfheal]
        0-HOME-LIRIS-dht: Setting layout of /log_apache_error with
        [Subvol_name: HOME-LIRIS-replicate-0, Err: -1 , Start: 0 , Stop:
        4294967295 ], 

      
    This is an INFO level message which says about the layout of a
    directory. Gluster-fuse client will print this INFO when it sets the
    layout on a directory. This error messages can be safely ignore.

    
        I switched clients to WARNING log level (gluster volume set
        HOME-LIRIS diagnostics.client-sys-log-level WARNING) which is
        fine for me.

        But maybe WARNING should be the default log level, at least for
        clients, no? In production getting 3 lines per created directory
        is useless, and anyone who wants to analyze a problem will
        switch to INFO or DEBUG.

      
    I see many uses get panic about this error message. I agree, we have
    to do something with this log entry's.

    
        Regards,

        --

        Y.

        
        Le 08/06/2016 17:35, Yannick Perret a écrit :

      
      Hello,

        
        I have a replica 2 volume managed on 2 identical server, using
        3.6.7 version of gluster. Here is the volume info: 

        Volume Name: HOME-LIRIS 

        Type: Replicate 

        Volume ID: 47b4b856-371b-4b8c-8baa-2b7c32d7bb23 

        Status: Started 

        Number of Bricks: 1 x 2 = 2 

        Transport-type: tcp 

        Bricks: 

        Brick1: sto1.mydomain:/glusterfs/home-liris/data 

        Brick2: sto2.mydomain:/glusterfs/home-liris/data 

        
        It is mounted on a (single) client with mount -t glusterfs
        sto1.mydomain:/HOME-LIRIS /futur-home/ 

        
        I started to copy a directory (~550Go, ~660 directories with
        many files) into it. Copy was done using 'cp -Rp'. 

        
        It seems to work fine but I get *many* log entries in the
        corresponding mountpoint logs: 

        [2016-06-07 14:01:27.587300] I
        [dht-selfheal.c:1065:dht_selfheal_layout_new_directory]
        0-HOME-LIRIS-dht: chunk size = 0xffffffff / 2064114 = 0x820 

        [2016-06-07 14:01:27.587338] I
        [dht-selfheal.c:1103:dht_selfheal_layout_new_directory]
        0-HOME-LIRIS-dht: assigning range size 0xffe76e40 to
        HOME-LIRIS-replicate-0 

        [2016-06-07 14:01:27.588436] I [MSGID: 109036]
        [dht-common.c:6296:dht_log_new_layout_for_dir_selfheal]
        0-HOME-LIRIS-dht: Setting layout of /olfamine with [Subvol_name:
        HOME-LIRIS-replicate-0, Err: -1 , Start: 0 , Stop: 4294967295 ],
        

        This is repeated for many files (124088 exactly). Is it normal?
        If yes I use default settings on the client so I find it a
        little bit verbose. If no can someone tell me what is the
        problem here? 

        
        Moreover at the end of the log file I have: 

        [2016-06-08 04:42:58.210617] A [MSGID: 0]
        [mem-pool.c:110:__gf_calloc] : no memory available for size
        (14651) [call stack follows] 

        [2016-06-08 04:42:58.219060] A [MSGID: 0]
        [mem-pool.c:134:__gf_malloc] : no memory available for size
        (21026) [call stack follows] 

        pending frames: 

        frame : type(1) op(CREATE) 

        frame : type(1) op(CREATE) 

        frame : type(1) op(LOOKUP) 

        frame : type(0) op(0) 

        patchset: git://git.gluster.com/glusterfs.git 

        signal received: 11 

        time of crash: 

        2016-06-08 04:42:58 

        configuration details: 

        argp 1 

        backtrace 1 

        dlfcn 1 

        libpthread 1 

        llistxattr 1 

        setfsid 1 

        spinlock 1 

        epoll.h 1 

        xattr.h 1 

        st_atim.tv_nsec 1 

        package-string: glusterfs 3.6.7 

        
        Which clearly don't seems right. 

        The data were not all copied (logs of copy got a logical list of
        "final transport node not connected" (or similar, it was
        translated in my language)). 

        
        I re-mounted the volume and created a directory with 'mkdir
        TOTO' and get a similar: 

        [2016-06-08 15:32:23.692936] I
        [dht-selfheal.c:1065:dht_selfheal_layout_new_directory]
        0-HOME-LIRIS-dht: chunk size = 0xffffffff / 2064114 = 0x820 

        [2016-06-08 15:32:23.692982] I
        [dht-selfheal.c:1103:dht_selfheal_layout_new_directory]
        0-HOME-LIRIS-dht: assigning range size 0xffe76e40 to
        HOME-LIRIS-replicate-0 

        [2016-06-08 15:32:23.694144] I [MSGID: 109036]
        [dht-common.c:6296:dht_log_new_layout_for_dir_selfheal]
        0-HOME-LIRIS-dht: Setting layout of /TOTO with [Subvol_name:
        HOME-LIRIS-replicate-0, Err: -1 , Start: 0 , Stop: 4294967295 ],
        

        but I don't get such message with files. 

        
        If it can help volumes are ~2To and content is far from that,
        and both bricks are ext4 (both same size). 

        
        Any help would be appreciated. 

        
        Regards, 

        -- 

        Y. 

        
        _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
      
      
      _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
    
    
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users