I missed your reply :). Sorry about that. ----- Original Message ----- > From: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx> > To: "Dan Ragle" <daniel@xxxxxxxxxxxxxx> > Cc: "Csaba Henk" <chenk@xxxxxxxxxx>, "gluster-users" <gluster-users@xxxxxxxxxxx> > Sent: Tuesday, February 6, 2018 1:14:10 AM > Subject: Re: Run away memory with gluster mount > > Hi Dan, > > I had a suggestion and a question in my previous response. Let us know > whether the suggestion helps and please let us know about your data-set > (like how many directories/files and how these directories/files are > organised) to understand the problem better. > > <snip> > > > In the > > meantime can you remount glusterfs with options > > --entry-timeout=0 and --attribute-timeout=0? This will make sure > > that kernel won't cache inodes/attributes of the file and should > > bring down the memory usage. > > > > I am curious to know what is your data-set like? Is it the case > > of too many directories and files present in deep directories? I > > am wondering whether a significant number of inodes cached by > > kernel are there to hold dentry structure in kernel. > > </snip> > > regards, > Raghavendra > > ----- Original Message ----- > > From: "Dan Ragle" <daniel@xxxxxxxxxxxxxx> > > To: "Nithya Balachandran" <nbalacha@xxxxxxxxxx> > > Cc: "gluster-users" <gluster-users@xxxxxxxxxxx>, "Csaba Henk" > > <chenk@xxxxxxxxxx> > > Sent: Saturday, February 3, 2018 7:28:15 PM > > Subject: Re: Run away memory with gluster mount > > > > > > > > On 2/2/2018 2:13 AM, Nithya Balachandran wrote: > > > Hi Dan, > > > > > > It sounds like you might be running into [1]. The patch has been posted > > > upstream and the fix should be in the next release. > > > In the meantime, I'm afraid there is no way to get around this without > > > restarting the process. > > > > > > Regards, > > > Nithya > > > > > > [1]https://bugzilla.redhat.com/show_bug.cgi?id=1541264 > > > > > > > Much appreciated. Will watch for the next release and retest then. > > > > Cheers! > > > > Dan > > > > > > > > On 2 February 2018 at 02:57, Dan Ragle <daniel@xxxxxxxxxxxxxx > > > <mailto:daniel@xxxxxxxxxxxxxx>> wrote: > > > > > > > > > > > > On 1/30/2018 6:31 AM, Raghavendra Gowdappa wrote: > > > > > > > > > > > > ----- Original Message ----- > > > > > > From: "Dan Ragle" <daniel@xxxxxxxxxxxxxx> > > > To: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx > > > <mailto:rgowdapp@xxxxxxxxxx>>, "Ravishankar N" > > > <ravishankar@xxxxxxxxxx <mailto:ravishankar@xxxxxxxxxx>> > > > Cc: gluster-users@xxxxxxxxxxx > > > <mailto:gluster-users@xxxxxxxxxxx>, "Csaba Henk" > > > <chenk@xxxxxxxxxx <mailto:chenk@xxxxxxxxxx>>, "Niels de Vos" > > > <ndevos@xxxxxxxxxx <mailto:ndevos@xxxxxxxxxx>>, "Nithya > > > Balachandran" <nbalacha@xxxxxxxxxx > > > <mailto:nbalacha@xxxxxxxxxx>> > > > Sent: Monday, January 29, 2018 9:02:21 PM > > > Subject: Re: Run away memory with gluster > > > mount > > > > > > > > > > > > On 1/29/2018 2:36 AM, Raghavendra Gowdappa wrote: > > > > > > > > > > > > ----- Original Message ----- > > > > > > From: "Ravishankar N" <ravishankar@xxxxxxxxxx > > > <mailto:ravishankar@xxxxxxxxxx>> > > > To: "Dan Ragle" <daniel@xxxxxxxxxxxxxx>, > > > gluster-users@xxxxxxxxxxx > > > <mailto:gluster-users@xxxxxxxxxxx> > > > Cc: "Csaba Henk" <chenk@xxxxxxxxxx > > > <mailto:chenk@xxxxxxxxxx>>, "Niels de Vos" > > > <ndevos@xxxxxxxxxx <mailto:ndevos@xxxxxxxxxx>>, > > > "Nithya Balachandran" <nbalacha@xxxxxxxxxx > > > <mailto:nbalacha@xxxxxxxxxx>>, > > > "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx > > > <mailto:rgowdapp@xxxxxxxxxx>> > > > Sent: Saturday, January 27, 2018 10:23:38 AM > > > Subject: Re: Run away memory with > > > gluster mount > > > > > > > > > > > > On 01/27/2018 02:29 AM, Dan Ragle wrote: > > > > > > > > > On 1/25/2018 8:21 PM, Ravishankar N wrote: > > > > > > > > > > > > On 01/25/2018 11:04 PM, Dan Ragle wrote: > > > > > > *sigh* trying again to correct > > > formatting ... apologize for the > > > earlier mess. > > > > > > Having a memory issue with Gluster > > > 3.12.4 and not sure how to > > > troubleshoot. I don't *think* this is > > > expected behavior. > > > > > > This is on an updated CentOS 7 box. The > > > setup is a simple two node > > > replicated layout where the two nodes > > > act as both server and > > > client. > > > > > > The volume in question: > > > > > > Volume Name: GlusterWWW > > > Type: Replicate > > > Volume ID: > > > 8e9b0e79-f309-4d9b-a5bb-45d065faaaa3 > > > Status: Started > > > Snapshot Count: 0 > > > Number of Bricks: 1 x 2 = 2 > > > Transport-type: tcp > > > Bricks: > > > Brick1: > > > vs1dlan.mydomain.com:/glusterfs_bricks/brick1/www > > > Brick2: > > > vs2dlan.mydomain.com:/glusterfs_bricks/brick1/www > > > Options Reconfigured: > > > nfs.disable: on > > > cluster.favorite-child-policy: mtime > > > transport.address-family: inet > > > > > > I had some other performance options in > > > there, (increased > > > cache-size, md invalidation, etc) but > > > stripped them out in an > > > attempt to > > > isolate the issue. Still got the problem > > > without them. > > > > > > The volume currently contains over 1M > > > files. > > > > > > When mounting the volume, I get (among > > > other things) a process as such: > > > > > > /usr/sbin/glusterfs > > > --volfile-server=localhost > > > --volfile-id=/GlusterWWW /var/www > > > > > > This process begins with little memory, > > > but then as files are > > > accessed in the volume the memory > > > increases. I setup a script that > > > simply reads the files in the volume one > > > at a time (no writes). It's > > > been running on and off about 12 hours > > > now and the resident > > > memory of the above process is already > > > at 7.5G and continues to grow > > > slowly. If I stop the test script the > > > memory stops growing, > > > but does not reduce. Restart the test > > > script and the memory begins > > > slowly growing again. > > > > > > This is obviously a contrived app > > > environment. With my intended > > > application load it takes about a week > > > or so for the memory to get > > > high enough to invoke the oom killer. > > > > > > > > > Can you try debugging with the statedump > > > (https://gluster.readthedocs.io/en/latest/Troubleshooting/statedump/#read-a-statedump > > > <https://gluster.readthedocs.io/en/latest/Troubleshooting/statedump/#read-a-statedump>) > > > of > > > the fuse mount process and see what member > > > is leaking? Take the > > > statedumps in succession, maybe once > > > initially during the I/O and > > > once the memory gets high enough to hit the > > > OOM mark. > > > Share the dumps here. > > > > > > Regards, > > > Ravi > > > > > > > > > Thanks for the reply. I noticed yesterday that > > > an update (3.12.5) had > > > been posted so I went ahead and updated and > > > repeated the test > > > overnight. The memory usage does not appear to > > > be growing as quickly > > > as is was with 3.12.4, but does still appear to > > > be growing. > > > > > > I should also mention that there is another > > > process beyond my test app > > > that is reading the files from the volume. > > > Specifically, there is an > > > rsync that runs from the second node 2-4 times > > > an hour that reads from > > > the GlusterWWW volume mounted on node 1. Since > > > none of the files in > > > that mount are changing it doesn't actually > > > rsync anything, but > > > nonetheless it is running and reading the files > > > in addition to my test > > > script. (It's a part of my intended production > > > setup that I forgot was > > > still running.) > > > > > > The mount process appears to be gaining memory > > > at a rate of about 1GB > > > every 4 hours or so. At that rate it'll take > > > several days before it > > > runs the box out of memory. But I took your > > > suggestion and made some > > > statedumps today anyway, about 2 hours apart, 4 > > > total so far. It looks > > > like there may already be some actionable > > > information. These are the > > > only registers where the num_allocs have grown > > > with each of the four > > > samples: > > > > > > [mount/fuse.fuse - usage-type gf_fuse_mt_gids_t > > > memusage] > > > ---> num_allocs at Fri Jan 26 08:57:31 2018: > > > 784 > > > ---> num_allocs at Fri Jan 26 10:55:50 2018: > > > 831 > > > ---> num_allocs at Fri Jan 26 12:55:15 2018: > > > 877 > > > ---> num_allocs at Fri Jan 26 14:58:27 2018: > > > 908 > > > > > > [mount/fuse.fuse - usage-type > > > gf_common_mt_fd_lk_ctx_t memusage] > > > ---> num_allocs at Fri Jan 26 08:57:31 2018: > > > 5 > > > ---> num_allocs at Fri Jan 26 10:55:50 2018: > > > 10 > > > ---> num_allocs at Fri Jan 26 12:55:15 2018: > > > 15 > > > ---> num_allocs at Fri Jan 26 14:58:27 2018: > > > 17 > > > > > > [cluster/distribute.GlusterWWW-dht - usage-type > > > gf_dht_mt_dht_layout_t > > > memusage] > > > ---> num_allocs at Fri Jan 26 08:57:31 2018: > > > 24243596 > > > ---> num_allocs at Fri Jan 26 10:55:50 2018: > > > 27902622 > > > ---> num_allocs at Fri Jan 26 12:55:15 2018: > > > 30678066 > > > ---> num_allocs at Fri Jan 26 14:58:27 2018: > > > 33801036 > > > > > > Not sure the best way to get you the full dumps. > > > They're pretty big, > > > over 1G for all four. Also, I noticed some > > > filepath information in > > > there that I'd rather not share. What's the > > > recommended next step? > > > > > > > > > Please run the following query on statedump files and > > > report us the > > > results: > > > # grep itable <client-statedump> | grep active | wc -l > > > # grep itable <client-statedump> | grep active_size > > > # grep itable <client-statedump> | grep lru | wc -l > > > # grep itable <client-statedump> | grep lru_size > > > # grep itable <client-statedump> | grep purge | wc -l > > > # grep itable <client-statedump> | grep purge_size > > > > > > > > > Had to restart the test and have been running for 36 hours > > > now. RSS is > > > currently up to 23g. > > > > > > Working on getting a bug report with link to the dumps. In > > > the mean > > > time, I'm including the results of your above queries for > > > the first > > > dump, the 18 hour dump, and the 36 hour dump: > > > > > > # grep itable glusterdump.153904.dump.1517104561 | grep > > > active | wc -l > > > 53865 > > > # grep itable glusterdump.153904.dump.1517169361 | grep > > > active | wc -l > > > 53864 > > > # grep itable glusterdump.153904.dump.1517234161 | grep > > > active | wc -l > > > 53864 > > > > > > # grep itable glusterdump.153904.dump.1517104561 | grep > > > active_size > > > xlator.mount.fuse.itable.active_size=53864 > > > # grep itable glusterdump.153904.dump.1517169361 | grep > > > active_size > > > xlator.mount.fuse.itable.active_size=53863 > > > # grep itable glusterdump.153904.dump.1517234161 | grep > > > active_size > > > xlator.mount.fuse.itable.active_size=53863 > > > > > > # grep itable glusterdump.153904.dump.1517104561 | grep lru > > > | wc -l > > > 998510 > > > # grep itable glusterdump.153904.dump.1517169361 | grep lru > > > | wc -l > > > 998510 > > > # grep itable glusterdump.153904.dump.1517234161 | grep lru > > > | wc -l > > > 995992 > > > > > > # grep itable glusterdump.153904.dump.1517104561 | grep > > > lru_size > > > xlator.mount.fuse.itable.lru_size=998508 > > > # grep itable glusterdump.153904.dump.1517169361 | grep > > > lru_size > > > xlator.mount.fuse.itable.lru_size=998508 > > > # grep itable glusterdump.153904.dump.1517234161 | grep > > > lru_size > > > xlator.mount.fuse.itable.lru_size=995990 > > > > > > > > > Around 1 million of inodes in lru table!! These are the inodes > > > kernel has just cached and no operation is currently progress on > > > these inodes. This could be the reason for high memory usage. > > > We've a patch being worked on (merged on experimental branch > > > currently) [1], that will help in these sceanrios. In the > > > meantime can you remount glusterfs with options > > > --entry-timeout=0 and --attribute-timeout=0? This will make sure > > > that kernel won't cache inodes/attributes of the file and should > > > bring down the memory usage. > > > > > > I am curious to know what is your data-set like? Is it the case > > > of too many directories and files present in deep directories? I > > > am wondering whether a significant number of inodes cached by > > > kernel are there to hold dentry structure in kernel. > > > > > > [1] https://review.gluster.org/#/c/18665/ > > > <https://review.gluster.org/#/c/18665/> > > > > > > > > > OK, remounted with your recommended attributes and repeated the > > > test. Now the mount process looks like this: > > > > > > /usr/sbin/glusterfs --attribute-timeout=0 --entry-timeout=0 > > > --volfile-server=localhost --volfile-id=/GlusterWWW /var/www > > > > > > However after running for 36 hours it's again at about 23g (about > > > the same place it was on the first test). > > > > > > A few metrics from the 36 hour mark: > > > > > > num_allocs for [cluster/distribute.GlusterWWW-dht - usage-type > > > gf_dht_mt_dht_layout_t memusage] is 109140094. Seems at least > > > somewhat similar to the original test, which had 117901593 at the 36 > > > hour mark. > > > > > > The dump file at the 36 hour mark had nothing for lru or lru_size. > > > However, at the dump two hours prior it had: > > > > > > # grep itable glusterdump.67299.dump.1517493361 | grep lru | wc -l > > > 998510 > > > # grep itable glusterdump.67299.dump.1517493361 | grep lru_size > > > xlator.mount.fuse.itable.lru_size=998508 > > > > > > and the same thing for the dump four hours later. Are these values > > > only relevant when the ls -R is actually running? I'm thinking the > > > 36 hour dump may have caught the ls -R between runs there (?) > > > > > > The data set is multiple Web sites. I know there's some litter there > > > we can clean up, but I'd guess not more than 200-300k files or so. > > > The biggest culprit is a single directory that we use as a > > > multi-purpose file store, with filenames stored as GUIDs and linked > > > to a DB. That directory currently has 500k+ files. Another directory > > > serves a similar purpose and has about 66k files in it. The rest is > > > generally distributed more "normally", I.E., a mixed nesting of > > > directories and files. > > > > > > Cheers! > > > > > > Dan > > > > > > > > > > > > # grep itable glusterdump.153904.dump.1517104561 | grep > > > purge | wc -l > > > 1 > > > # grep itable glusterdump.153904.dump.1517169361 | grep > > > purge | wc -l > > > 1 > > > # grep itable glusterdump.153904.dump.1517234161 | grep > > > purge | wc -l > > > 1 > > > > > > # grep itable glusterdump.153904.dump.1517104561 | grep > > > purge_size > > > xlator.mount.fuse.itable.purge_size=0 > > > # grep itable glusterdump.153904.dump.1517169361 | grep > > > purge_size > > > xlator.mount.fuse.itable.purge_size=0 > > > # grep itable glusterdump.153904.dump.1517234161 | grep > > > purge_size > > > xlator.mount.fuse.itable.purge_size=0 > > > > > > Cheers, > > > > > > Dan > > > > > > > > > > > > I've CC'd the fuse/ dht devs to see if these data > > > types have potential > > > leaks. Could you raise a bug with the volume info > > > and a (dropbox?) link > > > from which we can download the dumps? You can > > > remove/replace the > > > filepaths from them. > > > > > > Regards. > > > Ravi > > > > > > > > > Cheers! > > > > > > Dan > > > > > > > > > Is there potentially something > > > misconfigured here? > > > > > > I did see a reference to a memory leak > > > in another thread in this > > > list, but that had to do with the > > > setting of quotas, I don't have > > > any quotas set on my system. > > > > > > Thanks, > > > > > > Dan Ragle > > > daniel@xxxxxxxxxxxxxx > > > > > > On 1/25/2018 11:04 AM, Dan Ragle wrote: > > > > > > Having a memory issue with Gluster > > > 3.12.4 and not sure how to > > > troubleshoot. I don't *think* this > > > is expected behavior. This is on an > > > updated CentOS 7 box. The setup is a > > > simple two node replicated layout > > > where the two nodes act as both > > > server and client. The volume in > > > question: Volume Name: GlusterWWW > > > Type: Replicate Volume ID: > > > 8e9b0e79-f309-4d9b-a5bb-45d065faaaa3 > > > Status: Started Snapshot Count: 0 > > > Number of Bricks: 1 x 2 = 2 > > > Transport-type: tcp Bricks: Brick1: > > > vs1dlan.mydomain.com:/glusterfs_bricks/brick1/www > > > Brick2: > > > vs2dlan.mydomain.com:/glusterfs_bricks/brick1/www > > > Options > > > Reconfigured: > > > nfs.disable: on > > > cluster.favorite-child-policy: mtime > > > transport.address-family: inet I had > > > some other performance options in > > > there, (increased cache-size, md > > > invalidation, etc) but stripped them > > > out in an attempt to isolate the > > > issue. Still got the problem without > > > them. The volume currently contains > > > over 1M files. When mounting the > > > volume, I get (among other things) a > > > process as such: > > > /usr/sbin/glusterfs > > > --volfile-server=localhost > > > --volfile-id=/GlusterWWW > > > /var/www This process begins with > > > little memory, but then as files are > > > accessed in the volume the memory > > > increases. I setup a script that > > > simply reads the files in the volume > > > one at a time (no writes). It's > > > been running on and off about 12 > > > hours now and the resident memory of > > > the above process is already at 7.5G > > > and continues to grow slowly. > > > If I > > > stop the test script the memory > > > stops growing, but does not reduce. > > > Restart the test script and the > > > memory begins slowly growing again. > > > This > > > is obviously a contrived app > > > environment. With my intended > > > application > > > load it takes about a week or so for > > > the memory to get high enough to > > > invoke the oom killer. Is there > > > potentially something misconfigured > > > here? Thanks, Dan Ragle > > > daniel@xxxxxxxxxxxxxx > > > > > > > > > > > > > > > _______________________________________________ > > > Gluster-users mailing list > > > Gluster-users@xxxxxxxxxxx > > > <mailto:Gluster-users@xxxxxxxxxxx> > > > http://lists.gluster.org/mailman/listinfo/gluster-users > > > <http://lists.gluster.org/mailman/listinfo/gluster-users> > > > > > > _______________________________________________ > > > Gluster-users mailing list > > > Gluster-users@xxxxxxxxxxx > > > <mailto:Gluster-users@xxxxxxxxxxx> > > > http://lists.gluster.org/mailman/listinfo/gluster-users > > > <http://lists.gluster.org/mailman/listinfo/gluster-users> > > > > > > > > > _______________________________________________ > > > Gluster-users mailing list > > > Gluster-users@xxxxxxxxxxx > > > <mailto:Gluster-users@xxxxxxxxxxx> > > > http://lists.gluster.org/mailman/listinfo/gluster-users > > > <http://lists.gluster.org/mailman/listinfo/gluster-users> > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users@xxxxxxxxxxx > > http://lists.gluster.org/mailman/listinfo/gluster-users > _______________________________________________ > Gluster-users mailing list > Gluster-users@xxxxxxxxxxx > http://lists.gluster.org/mailman/listinfo/gluster-users _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users