Hi, I have a 3x replicated cluster running 4.1.7 on ubuntu 16.04.5, all 3 replicas are also clients hosting a Node.js/Nginx web server. The current configuration is as such: Volume Name: gvol1 Type: Replicate Volume ID: XXXXXX Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: vm000000:/srv/brick1/gvol1 Brick2: vm000001:/srv/brick1/gvol1 Brick3: vm000002:/srv/brick1/gvol1 Options Reconfigured: cluster.self-heal-readdir-size: 2KB cluster.self-heal-window-size: 2 cluster.background-self-heal-count: 20 network.ping-timeout: 5 disperse.eager-lock: off performance.parallel-readdir: on performance.readdir-ahead: on performance.rda-cache-limit: 128MB performance.cache-refresh-timeout: 10 performance.nl-cache-timeout: 600 performance.nl-cache: on cluster.nufa: on performance.enable-least-priority: off server.outstanding-rpc-limit: 128 performance.strict-o-direct: on cluster.shd-max-threads: 12 client.event-threads: 4 cluster.lookup-optimize: on network.inode-lru-limit: 90000 performance.md-cache-timeout: 600 performance.cache-invalidation: on performance.cache-samba-metadata: on performance.stat-prefetch: on features.cache-invalidation-timeout: 600 features.cache-invalidation: on storage.fips-mode-rchecksum: on transport.address-family: inet nfs.disable: on performance.client-io-threads: on features.utime: on storage.ctime: on server.event-threads: 4 performance.cache-size: 256MB performance.read-ahead: on cluster.readdir-optimize: on cluster.strict-readdir: on performance.io-thread-count: 8 server.allow-insecure: on cluster.read-hash-mode: 0 cluster.lookup-unhashed: auto cluster.choose-local: on I believe there’s a memory leak somewhere, it just keeps going up until it hangs one or more nodes taking the whole cluster down sometimes. I have taken 2 statedumps on one of the nodes, one where the memory is too high and another just after a reboot with the app running and the volume fully healed. https://pmcdigital.sharepoint.com/:u:/g/EYDsNqTf1UdEuE6B0ZNVPfIBf_I-AbaqHotB1lJOnxLlTg?e=boYP09
(high memory) https://pmcdigital.sharepoint.com/:u:/g/EWZBsnET2xBHl6OxO52RCfIBvQ0uIDQ1GKJZ1GrnviyMhg?e=wI3yaY
(after reboot) Any help would be greatly appreciated, Kindest Regards, Pedro Maia Costa
|
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users