On 2/2/2018 2:13 AM, Nithya Balachandran wrote:
Hi Dan,
It sounds like you might be running into [1]. The patch has been posted
upstream and the fix should be in the next release.
In the meantime, I'm afraid there is no way to get around this without
restarting the process.
Regards,
Nithya
[1]https://bugzilla.redhat.com/show_bug.cgi?id=1541264
Much appreciated. Will watch for the next release and retest then.
Cheers!
Dan
On 2 February 2018 at 02:57, Dan Ragle <daniel@xxxxxxxxxxxxxx
<mailto:daniel@xxxxxxxxxxxxxx>> wrote:
On 1/30/2018 6:31 AM, Raghavendra Gowdappa wrote:
----- Original Message -----
From: "Dan Ragle" <daniel@xxxxxxxxxxxxxx>
To: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx
<mailto:rgowdapp@xxxxxxxxxx>>, "Ravishankar N"
<ravishankar@xxxxxxxxxx <mailto:ravishankar@xxxxxxxxxx>>
Cc: gluster-users@xxxxxxxxxxx
<mailto:gluster-users@xxxxxxxxxxx>, "Csaba Henk"
<chenk@xxxxxxxxxx <mailto:chenk@xxxxxxxxxx>>, "Niels de Vos"
<ndevos@xxxxxxxxxx <mailto:ndevos@xxxxxxxxxx>>, "Nithya
Balachandran" <nbalacha@xxxxxxxxxx <mailto:nbalacha@xxxxxxxxxx>>
Sent: Monday, January 29, 2018 9:02:21 PM
Subject: Re: Run away memory with gluster mount
On 1/29/2018 2:36 AM, Raghavendra Gowdappa wrote:
----- Original Message -----
From: "Ravishankar N" <ravishankar@xxxxxxxxxx
<mailto:ravishankar@xxxxxxxxxx>>
To: "Dan Ragle" <daniel@xxxxxxxxxxxxxx>,
gluster-users@xxxxxxxxxxx
<mailto:gluster-users@xxxxxxxxxxx>
Cc: "Csaba Henk" <chenk@xxxxxxxxxx
<mailto:chenk@xxxxxxxxxx>>, "Niels de Vos"
<ndevos@xxxxxxxxxx <mailto:ndevos@xxxxxxxxxx>>,
"Nithya Balachandran" <nbalacha@xxxxxxxxxx
<mailto:nbalacha@xxxxxxxxxx>>,
"Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx
<mailto:rgowdapp@xxxxxxxxxx>>
Sent: Saturday, January 27, 2018 10:23:38 AM
Subject: Re: Run away memory with
gluster mount
On 01/27/2018 02:29 AM, Dan Ragle wrote:
On 1/25/2018 8:21 PM, Ravishankar N wrote:
On 01/25/2018 11:04 PM, Dan Ragle wrote:
*sigh* trying again to correct
formatting ... apologize for the
earlier mess.
Having a memory issue with Gluster
3.12.4 and not sure how to
troubleshoot. I don't *think* this is
expected behavior.
This is on an updated CentOS 7 box. The
setup is a simple two node
replicated layout where the two nodes
act as both server and
client.
The volume in question:
Volume Name: GlusterWWW
Type: Replicate
Volume ID:
8e9b0e79-f309-4d9b-a5bb-45d065faaaa3
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1:
vs1dlan.mydomain.com:/glusterfs_bricks/brick1/www
Brick2:
vs2dlan.mydomain.com:/glusterfs_bricks/brick1/www
Options Reconfigured:
nfs.disable: on
cluster.favorite-child-policy: mtime
transport.address-family: inet
I had some other performance options in
there, (increased
cache-size, md invalidation, etc) but
stripped them out in an
attempt to
isolate the issue. Still got the problem
without them.
The volume currently contains over 1M files.
When mounting the volume, I get (among
other things) a process as such:
/usr/sbin/glusterfs
--volfile-server=localhost
--volfile-id=/GlusterWWW /var/www
This process begins with little memory,
but then as files are
accessed in the volume the memory
increases. I setup a script that
simply reads the files in the volume one
at a time (no writes). It's
been running on and off about 12 hours
now and the resident
memory of the above process is already
at 7.5G and continues to grow
slowly. If I stop the test script the
memory stops growing,
but does not reduce. Restart the test
script and the memory begins
slowly growing again.
This is obviously a contrived app
environment. With my intended
application load it takes about a week
or so for the memory to get
high enough to invoke the oom killer.
Can you try debugging with the statedump
(https://gluster.readthedocs.io/en/latest/Troubleshooting/statedump/#read-a-statedump
<https://gluster.readthedocs.io/en/latest/Troubleshooting/statedump/#read-a-statedump>)
of
the fuse mount process and see what member
is leaking? Take the
statedumps in succession, maybe once
initially during the I/O and
once the memory gets high enough to hit the
OOM mark.
Share the dumps here.
Regards,
Ravi
Thanks for the reply. I noticed yesterday that
an update (3.12.5) had
been posted so I went ahead and updated and
repeated the test
overnight. The memory usage does not appear to
be growing as quickly
as is was with 3.12.4, but does still appear to
be growing.
I should also mention that there is another
process beyond my test app
that is reading the files from the volume.
Specifically, there is an
rsync that runs from the second node 2-4 times
an hour that reads from
the GlusterWWW volume mounted on node 1. Since
none of the files in
that mount are changing it doesn't actually
rsync anything, but
nonetheless it is running and reading the files
in addition to my test
script. (It's a part of my intended production
setup that I forgot was
still running.)
The mount process appears to be gaining memory
at a rate of about 1GB
every 4 hours or so. At that rate it'll take
several days before it
runs the box out of memory. But I took your
suggestion and made some
statedumps today anyway, about 2 hours apart, 4
total so far. It looks
like there may already be some actionable
information. These are the
only registers where the num_allocs have grown
with each of the four
samples:
[mount/fuse.fuse - usage-type gf_fuse_mt_gids_t
memusage]
---> num_allocs at Fri Jan 26 08:57:31 2018: 784
---> num_allocs at Fri Jan 26 10:55:50 2018: 831
---> num_allocs at Fri Jan 26 12:55:15 2018: 877
---> num_allocs at Fri Jan 26 14:58:27 2018: 908
[mount/fuse.fuse - usage-type
gf_common_mt_fd_lk_ctx_t memusage]
---> num_allocs at Fri Jan 26 08:57:31 2018: 5
---> num_allocs at Fri Jan 26 10:55:50 2018: 10
---> num_allocs at Fri Jan 26 12:55:15 2018: 15
---> num_allocs at Fri Jan 26 14:58:27 2018: 17
[cluster/distribute.GlusterWWW-dht - usage-type
gf_dht_mt_dht_layout_t
memusage]
---> num_allocs at Fri Jan 26 08:57:31 2018:
24243596
---> num_allocs at Fri Jan 26 10:55:50 2018:
27902622
---> num_allocs at Fri Jan 26 12:55:15 2018:
30678066
---> num_allocs at Fri Jan 26 14:58:27 2018:
33801036
Not sure the best way to get you the full dumps.
They're pretty big,
over 1G for all four. Also, I noticed some
filepath information in
there that I'd rather not share. What's the
recommended next step?
Please run the following query on statedump files and
report us the
results:
# grep itable <client-statedump> | grep active | wc -l
# grep itable <client-statedump> | grep active_size
# grep itable <client-statedump> | grep lru | wc -l
# grep itable <client-statedump> | grep lru_size
# grep itable <client-statedump> | grep purge | wc -l
# grep itable <client-statedump> | grep purge_size
Had to restart the test and have been running for 36 hours
now. RSS is
currently up to 23g.
Working on getting a bug report with link to the dumps. In
the mean
time, I'm including the results of your above queries for
the first
dump, the 18 hour dump, and the 36 hour dump:
# grep itable glusterdump.153904.dump.1517104561 | grep
active | wc -l
53865
# grep itable glusterdump.153904.dump.1517169361 | grep
active | wc -l
53864
# grep itable glusterdump.153904.dump.1517234161 | grep
active | wc -l
53864
# grep itable glusterdump.153904.dump.1517104561 | grep
active_size
xlator.mount.fuse.itable.active_size=53864
# grep itable glusterdump.153904.dump.1517169361 | grep
active_size
xlator.mount.fuse.itable.active_size=53863
# grep itable glusterdump.153904.dump.1517234161 | grep
active_size
xlator.mount.fuse.itable.active_size=53863
# grep itable glusterdump.153904.dump.1517104561 | grep lru
| wc -l
998510
# grep itable glusterdump.153904.dump.1517169361 | grep lru
| wc -l
998510
# grep itable glusterdump.153904.dump.1517234161 | grep lru
| wc -l
995992
# grep itable glusterdump.153904.dump.1517104561 | grep lru_size
xlator.mount.fuse.itable.lru_size=998508
# grep itable glusterdump.153904.dump.1517169361 | grep lru_size
xlator.mount.fuse.itable.lru_size=998508
# grep itable glusterdump.153904.dump.1517234161 | grep lru_size
xlator.mount.fuse.itable.lru_size=995990
Around 1 million of inodes in lru table!! These are the inodes
kernel has just cached and no operation is currently progress on
these inodes. This could be the reason for high memory usage.
We've a patch being worked on (merged on experimental branch
currently) [1], that will help in these sceanrios. In the
meantime can you remount glusterfs with options
--entry-timeout=0 and --attribute-timeout=0? This will make sure
that kernel won't cache inodes/attributes of the file and should
bring down the memory usage.
I am curious to know what is your data-set like? Is it the case
of too many directories and files present in deep directories? I
am wondering whether a significant number of inodes cached by
kernel are there to hold dentry structure in kernel.
[1] https://review.gluster.org/#/c/18665/
<https://review.gluster.org/#/c/18665/>
OK, remounted with your recommended attributes and repeated the
test. Now the mount process looks like this:
/usr/sbin/glusterfs --attribute-timeout=0 --entry-timeout=0
--volfile-server=localhost --volfile-id=/GlusterWWW /var/www
However after running for 36 hours it's again at about 23g (about
the same place it was on the first test).
A few metrics from the 36 hour mark:
num_allocs for [cluster/distribute.GlusterWWW-dht - usage-type
gf_dht_mt_dht_layout_t memusage] is 109140094. Seems at least
somewhat similar to the original test, which had 117901593 at the 36
hour mark.
The dump file at the 36 hour mark had nothing for lru or lru_size.
However, at the dump two hours prior it had:
# grep itable glusterdump.67299.dump.1517493361 | grep lru | wc -l
998510
# grep itable glusterdump.67299.dump.1517493361 | grep lru_size
xlator.mount.fuse.itable.lru_size=998508
and the same thing for the dump four hours later. Are these values
only relevant when the ls -R is actually running? I'm thinking the
36 hour dump may have caught the ls -R between runs there (?)
The data set is multiple Web sites. I know there's some litter there
we can clean up, but I'd guess not more than 200-300k files or so.
The biggest culprit is a single directory that we use as a
multi-purpose file store, with filenames stored as GUIDs and linked
to a DB. That directory currently has 500k+ files. Another directory
serves a similar purpose and has about 66k files in it. The rest is
generally distributed more "normally", I.E., a mixed nesting of
directories and files.
Cheers!
Dan
# grep itable glusterdump.153904.dump.1517104561 | grep
purge | wc -l
1
# grep itable glusterdump.153904.dump.1517169361 | grep
purge | wc -l
1
# grep itable glusterdump.153904.dump.1517234161 | grep
purge | wc -l
1
# grep itable glusterdump.153904.dump.1517104561 | grep
purge_size
xlator.mount.fuse.itable.purge_size=0
# grep itable glusterdump.153904.dump.1517169361 | grep
purge_size
xlator.mount.fuse.itable.purge_size=0
# grep itable glusterdump.153904.dump.1517234161 | grep
purge_size
xlator.mount.fuse.itable.purge_size=0
Cheers,
Dan
I've CC'd the fuse/ dht devs to see if these data
types have potential
leaks. Could you raise a bug with the volume info
and a (dropbox?) link
from which we can download the dumps? You can
remove/replace the
filepaths from them.
Regards.
Ravi
Cheers!
Dan
Is there potentially something
misconfigured here?
I did see a reference to a memory leak
in another thread in this
list, but that had to do with the
setting of quotas, I don't have
any quotas set on my system.
Thanks,
Dan Ragle
daniel@xxxxxxxxxxxxxx
On 1/25/2018 11:04 AM, Dan Ragle wrote:
Having a memory issue with Gluster
3.12.4 and not sure how to
troubleshoot. I don't *think* this
is expected behavior. This is on an
updated CentOS 7 box. The setup is a
simple two node replicated layout
where the two nodes act as both
server and client. The volume in
question: Volume Name: GlusterWWW
Type: Replicate Volume ID:
8e9b0e79-f309-4d9b-a5bb-45d065faaaa3
Status: Started Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp Bricks: Brick1:
vs1dlan.mydomain.com:/glusterfs_bricks/brick1/www
Brick2:
vs2dlan.mydomain.com:/glusterfs_bricks/brick1/www
Options
Reconfigured:
nfs.disable: on
cluster.favorite-child-policy: mtime
transport.address-family: inet I had
some other performance options in
there, (increased cache-size, md
invalidation, etc) but stripped them
out in an attempt to isolate the
issue. Still got the problem without
them. The volume currently contains
over 1M files. When mounting the
volume, I get (among other things) a
process as such:
/usr/sbin/glusterfs
--volfile-server=localhost
--volfile-id=/GlusterWWW
/var/www This process begins with
little memory, but then as files are
accessed in the volume the memory
increases. I setup a script that
simply reads the files in the volume
one at a time (no writes). It's
been running on and off about 12
hours now and the resident memory of
the above process is already at 7.5G
and continues to grow slowly.
If I
stop the test script the memory
stops growing, but does not reduce.
Restart the test script and the
memory begins slowly growing again.
This
is obviously a contrived app
environment. With my intended
application
load it takes about a week or so for
the memory to get high enough to
invoke the oom killer. Is there
potentially something misconfigured
here? Thanks, Dan Ragle
daniel@xxxxxxxxxxxxxx
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
<mailto:Gluster-users@xxxxxxxxxxx>
http://lists.gluster.org/mailman/listinfo/gluster-users
<http://lists.gluster.org/mailman/listinfo/gluster-users>
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
<mailto:Gluster-users@xxxxxxxxxxx>
http://lists.gluster.org/mailman/listinfo/gluster-users
<http://lists.gluster.org/mailman/listinfo/gluster-users>
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
<mailto:Gluster-users@xxxxxxxxxxx>
http://lists.gluster.org/mailman/listinfo/gluster-users
<http://lists.gluster.org/mailman/listinfo/gluster-users>
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users