Hello,
We have a 3-way replicated Gluster setup where clients are connected through NFS and the clients are also the server. Here we see the Gluster NFS server keeps increasing the RAM usage until eventually the server goes out of memory. We have this on all 3 servers. The server has 96GB RAM total and we've seen the Gluster NFS server use op to 70GB RAM and all the swap was 100% in use. If other processes wouldn't also use the RAM I guess Gluster would claim that as well.
We are running GlusterFS 3.12.9-1 on Debian 8.
The process causing the high memory is:
/usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/run/gluster/nfs/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/gluster/94e073c0dae2c47025351342ba0ddc44.socket
Gluster volume info:
Volume Name: www
Type: Replicate
Volume ID: fbcc21ee-bd0b-40a5-8785-bd00e49e9b72
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.0.0.3:/storage/sdc1/www
Brick2: 10.0.0.2:/storage/sdc1/www
Brick3: 10.0.0.1:/storage/sdc1/www
Options Reconfigured:
diagnostics.client-log-level: ERROR
performance.stat-prefetch: on
performance.md-cache-timeout: 600
performance.cache-invalidation: on
features.cache-invalidation: on
network.ping-timeout: 3
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: off
performance.cache-size: 1GB
performance.write-behind-window-size: 4MB
performance.nfs.io-threads: on
performance.nfs.io-cache: off
performance.nfs.quick-read: off
performance.nfs.write-behind-window-size: 4MB
features.cache-invalidation-timeout: 600
performance.nfs.stat-prefetch: on
network.inode-lru-limit: 90000
performance.cache-priority: *.php:3,*.temp:3,*:1
cluster.readdir-optimize: on
performance.nfs.read-ahead: off
performance.flush-behind: on
performance.write-behind: on
performance.nfs.write-behind: on
performance.nfs.flush-behind: on
features.bitrot: on
features.scrub: Active
performance.quick-read: off
performance.io-thread-count: 64
nfs.enable-ino32: on
nfs.log-level: ERROR
storage.build-pgfid: off
diagnostics.brick-log-level: WARNING
cluster.self-heal-daemon: enable
We don't see anyting in the logs that looks like it could explain the high memory. We did make a statedump which I'll post here and which I have also attached as attachment:
Running the command to get the statedump is quite dangerous for us as the USR1 signal appeared to cause Gluster to move swap memory back into RAM and go offline while this is in progress.
Fwiw we do have vm.swappiness set to 1
Does anyone have an idea of what could cause this and what we can do to stop such high memory usage?
Cheers,
Niels
Attachment:
glusterdump.2076.dump.1527500065
Description: Binary data
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users