----- Original Message ----- > From: "Nithya Balachandran" <nbalacha@xxxxxxxxxx> > To: "Ravishankar N" <ravishankar@xxxxxxxxxx> > Cc: "Csaba Henk" <chenk@xxxxxxxxxx>, "gluster-users" <gluster-users@xxxxxxxxxxx> > Sent: Monday, January 29, 2018 10:49:43 AM > Subject: Re: Run away memory with gluster mount > > Csaba, > > Could this be the problem of the inodes not getting freed in the fuse > process? We can answer that question only after looking into statedumps. If we find too many inodes in fuse inode table's lru list (with refcount 0, lookup count > 0), it could be because sub-optimal garbage collection of inodes. > > Daniel, > as Ravi requested, please provide access to the statedumps. You can strip out > the filepath information. > Does your data set include a lot of directories? > > > Thanks, > Nithya > > On 27 January 2018 at 10:23, Ravishankar N < ravishankar@xxxxxxxxxx > wrote: > > > > > > On 01/27/2018 02:29 AM, Dan Ragle wrote: > > > > On 1/25/2018 8:21 PM, Ravishankar N wrote: > > > > > On 01/25/2018 11:04 PM, Dan Ragle wrote: > > > *sigh* trying again to correct formatting ... apologize for the earlier mess. > > Having a memory issue with Gluster 3.12.4 and not sure how to troubleshoot. I > don't *think* this is expected behavior. > > This is on an updated CentOS 7 box. The setup is a simple two node replicated > layout where the two nodes act as both server and > client. > > The volume in question: > > Volume Name: GlusterWWW > Type: Replicate > Volume ID: 8e9b0e79-f309-4d9b-a5bb-45d065faaaa3 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: vs1dlan.mydomain.com:/glusterfs_bricks/brick1/www > Brick2: vs2dlan.mydomain.com:/glusterfs_bricks/brick1/www > Options Reconfigured: > nfs.disable: on > cluster.favorite-child-policy: mtime > transport.address-family: inet > > I had some other performance options in there, (increased cache-size, md > invalidation, etc) but stripped them out in an attempt to > isolate the issue. Still got the problem without them. > > The volume currently contains over 1M files. > > When mounting the volume, I get (among other things) a process as such: > > /usr/sbin/glusterfs --volfile-server=localhost --volfile-id=/GlusterWWW > /var/www > > This process begins with little memory, but then as files are accessed in the > volume the memory increases. I setup a script that > simply reads the files in the volume one at a time (no writes). It's been > running on and off about 12 hours now and the resident > memory of the above process is already at 7.5G and continues to grow slowly. > If I stop the test script the memory stops growing, > but does not reduce. Restart the test script and the memory begins slowly > growing again. > > This is obviously a contrived app environment. With my intended application > load it takes about a week or so for the memory to get > high enough to invoke the oom killer. > > Can you try debugging with the statedump ( > https://gluster.readthedocs.io/en/latest/Troubleshooting/statedump/#read-a-statedump > ) of > the fuse mount process and see what member is leaking? Take the statedumps in > succession, maybe once initially during the I/O and > once the memory gets high enough to hit the OOM mark. > Share the dumps here. > > Regards, > Ravi > > Thanks for the reply. I noticed yesterday that an update (3.12.5) had been > posted so I went ahead and updated and repeated the test overnight. The > memory usage does not appear to be growing as quickly as is was with 3.12.4, > but does still appear to be growing. > > I should also mention that there is another process beyond my test app that > is reading the files from the volume. Specifically, there is an rsync that > runs from the second node 2-4 times an hour that reads from the GlusterWWW > volume mounted on node 1. Since none of the files in that mount are changing > it doesn't actually rsync anything, but nonetheless it is running and > reading the files in addition to my test script. (It's a part of my intended > production setup that I forgot was still running.) > > The mount process appears to be gaining memory at a rate of about 1GB every 4 > hours or so. At that rate it'll take several days before it runs the box out > of memory. But I took your suggestion and made some statedumps today anyway, > about 2 hours apart, 4 total so far. It looks like there may already be some > actionable information. These are the only registers where the num_allocs > have grown with each of the four samples: > > [mount/fuse.fuse - usage-type gf_fuse_mt_gids_t memusage] > ---> num_allocs at Fri Jan 26 08:57:31 2018: 784 > ---> num_allocs at Fri Jan 26 10:55:50 2018: 831 > ---> num_allocs at Fri Jan 26 12:55:15 2018: 877 > ---> num_allocs at Fri Jan 26 14:58:27 2018: 908 > > [mount/fuse.fuse - usage-type gf_common_mt_fd_lk_ctx_t memusage] > ---> num_allocs at Fri Jan 26 08:57:31 2018: 5 > ---> num_allocs at Fri Jan 26 10:55:50 2018: 10 > ---> num_allocs at Fri Jan 26 12:55:15 2018: 15 > ---> num_allocs at Fri Jan 26 14:58:27 2018: 17 > > [cluster/distribute.GlusterWWW-dht - usage-type gf_dht_mt_dht_layout_t > memusage] > ---> num_allocs at Fri Jan 26 08:57:31 2018: 24243596 > ---> num_allocs at Fri Jan 26 10:55:50 2018: 27902622 > ---> num_allocs at Fri Jan 26 12:55:15 2018: 30678066 > ---> num_allocs at Fri Jan 26 14:58:27 2018: 33801036 > > Not sure the best way to get you the full dumps. They're pretty big, over 1G > for all four. Also, I noticed some filepath information in there that I'd > rather not share. What's the recommended next step? > > I've CC'd the fuse/ dht devs to see if these data types have potential leaks. > Could you raise a bug with the volume info and a (dropbox?) link from which > we can download the dumps? You can remove/replace the filepaths from them. > > Regards. > Ravi > > > > > > Cheers! > > Dan > > > > > > > Is there potentially something misconfigured here? > > I did see a reference to a memory leak in another thread in this list, but > that had to do with the setting of quotas, I don't have > any quotas set on my system. > > Thanks, > > Dan Ragle > daniel@xxxxxxxxxxxxxx > > On 1/25/2018 11:04 AM, Dan Ragle wrote: > > > Having a memory issue with Gluster 3.12.4 and not sure how to > troubleshoot. I don't *think* this is expected behavior. This is on an > updated CentOS 7 box. The setup is a simple two node replicated layout > where the two nodes act as both server and client. The volume in > question: Volume Name: GlusterWWW Type: Replicate Volume ID: > 8e9b0e79-f309-4d9b-a5bb-45d065faaaa3 Status: Started Snapshot Count: 0 > Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: > vs1dlan.mydomain.com:/glusterfs_bricks/brick1/www Brick2: > vs2dlan.mydomain.com:/glusterfs_bricks/brick1/www Options Reconfigured: > nfs.disable: on cluster.favorite-child-policy: mtime > transport.address-family: inet I had some other performance options in > there, (increased cache-size, md invalidation, etc) but stripped them > out in an attempt to isolate the issue. Still got the problem without > them. The volume currently contains over 1M files. When mounting the > volume, I get (among other things) a process as such: > /usr/sbin/glusterfs --volfile-server=localhost --volfile-id=/GlusterWWW > /var/www This process begins with little memory, but then as files are > accessed in the volume the memory increases. I setup a script that > simply reads the files in the volume one at a time (no writes). It's > been running on and off about 12 hours now and the resident memory of > the above process is already at 7.5G and continues to grow slowly. If I > stop the test script the memory stops growing, but does not reduce. > Restart the test script and the memory begins slowly growing again. This > is obviously a contrived app environment. With my intended application > load it takes about a week or so for the memory to get high enough to > invoke the oom killer. Is there potentially something misconfigured > here? Thanks, Dan Ragle daniel@xxxxxxxxxxxxxx > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users@xxxxxxxxxxx > http://lists.gluster.org/mailman/listinfo/gluster-users > > _______________________________________________ > Gluster-users mailing list > Gluster-users@xxxxxxxxxxx > http://lists.gluster.org/mailman/listinfo/gluster-users > > _______________________________________________ > Gluster-users mailing list > Gluster-users@xxxxxxxxxxx > http://lists.gluster.org/mailman/listinfo/gluster-users > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users@xxxxxxxxxxx > http://lists.gluster.org/mailman/listinfo/gluster-users _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users