glusterFS memory leak?

larry.bates at vitalesafe.com (Larry Bates) · Sat, 26 Dec 2009 11:01:17 -0600

I removed the quickread translator and the memory leak appears to have
disappeared.  

I've repeated the steps and glusterfs client process is holding solid at about
51Mb 

of resident memory.

FYI, Larry

From: raghavendra.hg at gmail.com [mailto:raghavendra.hg at gmail.com] On Behalf Of
Raghavendra G
Sent: Wednesday, December 23, 2009 7:48 PM
To: Larry Bates
Cc: gluster-users
Subject: Re: glusterFS memory leak?

what operations are you doing on mount point? how easy it is to reproduce memory
leak?

On Thu, Dec 24, 2009 at 5:08 AM, Larry Bates <larry.bates at vitalesafe.com> wrote:

I restarted the mirror and once again the glusterfs client process is just
growing.  Echoing the 3 to /proc/sys/vm/drop_caches eems to shriink the memory
footprint but only by a small amount.

BTW - it is the client process that seems to grow infinitely.  Note: I have one
machine that is acting as both a client and a server (which is where the client
process is growing) and another machine that is a dedicated GFS server.

Here are my configs:

Client:

#

# Add client feature and attach to remote subvolumes of gfs001

#

volume gfs001brick1

  type protocol/client

  option transport-type tcp            # for TCP/IP transport

  option remote-host gfs001            # IP address of the remote volume

  option remote-subvolume vol1         # name of the remote volume

  #option transport.socket.nodelay on   # undocumented option for speed

end-volume

volume gfs001brick2

  type protocol/client

  option transport-type tcp            # for TCP/IP transport

  option remote-host gfs001            # IP address of the remote volume

  option remote-subvolume vol2         # name of the remote volume

  #option transport.socket.nodelay on   # undocumented option for speed

end-volume

volume coraidbrick13

  type protocol/client

  option transport-type tcp

  option remote-host 10.0.0.71

  option remote-subvolume coraidvol13

  #option transport.socket.nodelay on

end-volume

volume coraidbrick14

  type protocol/client

  option transport-type tcp

  option remote-host 10.0.0.71

  option remote-subvolume coraidvol14

  #option transport.socket.nodelay on

end-volume

#

# Replicate volumes

#

volume afr-vol1

  type cluster/replicate

  subvolumes gfs001brick1 coraidbrick13

end-volume

volume afr-vol2

  type cluster/replicate

  subvolumes gfs001brick2 coraidbrick14

end-volume

#

# Distribute files across bricks

#

volume dht-vol

  type cluster/distribute

  subvolumes afr-vol1 afr-vol2

  option min-free-disk 2%               # 2% of 1.8Tb volumes is 36Gb

end-volume

#

# Add quick-read for small files

#

volume quickread

  type performance/quick-read

  option cache-timeout 1         # default 1 second

  option max-file-size 256KB     # default 64Kb

  subvolumes dht-vol

end-volume

Server:

volume coraidbrick13

  type storage/posix                          # POSIX FS translator

  option directory /mnt/glusterfs/e101.13     # Export this directory

  option background-unlink yes                # unlink in background

end-volume

volume coraidbrick14

  type storage/posix

  option directory /mnt/glusterfs/e101.14

  option background-unlink yes

end-volume

volume iot1

  type performance/io-threads

  option thread-count 4

  subvolumes coraidbrick13

end-volume

volume iot2

  type performance/io-threads

  option thread-count 4

  subvolumes coraidbrick14

end-volume

volume coraidvol13

  type features/locks

  subvolumes iot1

end-volume

volume coraidvol14

  type features/locks

  subvolumes iot2

end-volume

## Add network serving capability to volumes

volume server

  type protocol/server

  option transport-type tcp                    # For TCP/IP transport

  subvolumes coraidvol13 coraidvol14           # Expose both volumes

  option auth.addr.coraidvol13.allow 10.0.0.*

  option auth.addr.coraidvol14.allow 10.0.0.*

end-volume

Thanks,

Larry

From: raghavendra.hg at gmail.com [mailto:raghavendra.hg at gmail.com] On Behalf Of
Raghavendra G
Sent: Wednesday, December 23, 2009 2:43 PM
To: Larry Bates
Cc: gluster-users
Subject: Re: glusterFS memory leak?

Hi Larry,

What is the client and server configuration? Instead of killing glusterfs
process, can you do "echo 3 > /proc/sys/vm/drop_caches" and check whether memory
usage comes down?

regards,

On Wed, Dec 23, 2009 at 9:01 PM, Larry Bates <larry.bates at vitalesafe.com> wrote:

I've successfully set up GlusterFS 3.0 with a single server.  I brought on a 2nd
server and setup AFR and have been working through the mirroring process.  I
started an "ls -alR" to trigger a complete mirror of all files between the two
servers.  After running for about 16 hours I started getting kernel out of
memory errors.  Looking at top I see:

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2561 root      15   0 13.0g 8.9g  860 S    7 91.6  60:56.02 glusterfs

Seems that the client has used up all my memory (RAM and SWAP).  Killing the
process returned all the memory to the OS.

BTW - I'm working on a 2.3Tb store that contains about 2.5 million files in 65K
folders.

Thoughts?

Larry Bates
vitalEsafe, Inc.
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

-- 
Raghavendra G

-- 
Raghavendra G