Re: Run away memory with gluster mount

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 21 February 2018 at 21:11, Dan Ragle <daniel@xxxxxxxxxxxxxx> wrote:


On 2/3/2018 8:58 AM, Dan Ragle wrote:


On 2/2/2018 2:13 AM, Nithya Balachandran wrote:
Hi Dan,

It sounds like you might be running into [1]. The patch has been posted upstream and the fix should be in the next release.
In the meantime, I'm afraid there is no way to get around this without restarting the process.

Regards,
Nithya

[1]https://bugzilla.redhat.com/show_bug.cgi?id=1541264


Much appreciated. Will watch for the next release and retest then.

Cheers!

Dan


FYI, this looks like it's fixed in 3.12.6. Ran the test setup with repeated ls listings for just shy of 48 hours with no increase in RAM usage. Next will try my production application load for awhile to see if it holds steady.

The gf_dht_mt_dht_layout_t memusage num_allocs went quickly up to 105415 and then stayed there for the entire 48 hours.


Excellent. Thanks for letting us know.

Nithya
 
Thanks for the quick response,

Dan


On 2 February 2018 at 02:57, Dan Ragle <daniel@xxxxxxxxxxxxxx <mailto:daniel@xxxxxxxxxxxxxx>> wrote:



    On 1/30/2018 6:31 AM, Raghavendra Gowdappa wrote:



        ----- Original Message -----

            From: "Dan Ragle" <daniel@xxxxxxxxxxxxxx>
            To: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx
            <mailto:rgowdapp@xxxxxxxxxx>>, "Ravishankar N"
            <ravishankar@xxxxxxxxxx <mailto:ravishankar@xxxxxxxxxx>>
            Cc: gluster-users@xxxxxxxxxxx
            <mailto:gluster-users@gluster.org>, "Csaba Henk"
            <chenk@xxxxxxxxxx <mailto:chenk@xxxxxxxxxx>>, "Niels de Vos"
            <ndevos@xxxxxxxxxx <mailto:ndevos@xxxxxxxxxx>>, "Nithya
            Balachandran" <nbalacha@xxxxxxxxxx <mailto:nbalacha@xxxxxxxxxx>>
            Sent: Monday, January 29, 2018 9:02:21 PM
            Subject: Re: [Gluster-users] Run away memory with gluster mount



            On 1/29/2018 2:36 AM, Raghavendra Gowdappa wrote:



                ----- Original Message -----

                    From: "Ravishankar N" <ravishankar@xxxxxxxxxx
                    <mailto:ravishankar@xxxxxxxxxx>>
                    To: "Dan Ragle" <daniel@xxxxxxxxxxxxxx>,
                    gluster-users@xxxxxxxxxxx
                    <mailto:gluster-users@gluster.org>
                    Cc: "Csaba Henk" <chenk@xxxxxxxxxx
                    <mailto:chenk@xxxxxxxxxx>>, "Niels de Vos"
                    <ndevos@xxxxxxxxxx <mailto:ndevos@xxxxxxxxxx>>,
                    "Nithya Balachandran" <nbalacha@xxxxxxxxxx
                    <mailto:nbalacha@xxxxxxxxxx>>,
                    "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx
                    <mailto:rgowdapp@xxxxxxxxxx>>
                    Sent: Saturday, January 27, 2018 10:23:38 AM
                    Subject: Re: [Gluster-users] Run away memory with
                    gluster mount



                    On 01/27/2018 02:29 AM, Dan Ragle wrote:


                        On 1/25/2018 8:21 PM, Ravishankar N wrote:



                            On 01/25/2018 11:04 PM, Dan Ragle wrote:

                                *sigh* trying again to correct
                                formatting ... apologize for the
                                earlier mess.

                                Having a memory issue with Gluster
                                3.12.4 and not sure how to
                                troubleshoot. I don't *think* this is
                                expected behavior.

                                This is on an updated CentOS 7 box. The
                                setup is a simple two node
                                replicated layout where the two nodes
                                act as both server and
                                client.

                                The volume in question:

                                Volume Name: GlusterWWW
                                Type: Replicate
                                Volume ID:
                                8e9b0e79-f309-4d9b-a5bb-45d065faaaa3
                                Status: Started
                                Snapshot Count: 0
                                Number of Bricks: 1 x 2 = 2
                                Transport-type: tcp
                                Bricks:
                                Brick1:
                                vs1dlan.mydomain.com:/glusterfs_bricks/brick1/www
                                Brick2:
                                vs2dlan.mydomain.com:/glusterfs_bricks/brick1/www
                                Options Reconfigured:
                                nfs.disable: on
                                cluster.favorite-child-policy: mtime
                                transport.address-family: inet

                                I had some other performance options in
                                there, (increased
                                cache-size, md invalidation, etc) but
                                stripped them out in an
                                attempt to
                                isolate the issue. Still got the problem
                                without them.

                                The volume currently contains over 1M files.

                                When mounting the volume, I get (among
                                other things) a process as such:

                                /usr/sbin/glusterfs
                                --volfile-server=localhost
                                --volfile-id=/GlusterWWW /var/www

                                This process begins with little memory,
                                but then as files are
                                accessed in the volume the memory
                                increases. I setup a script that
                                simply reads the files in the volume one
                                at a time (no writes). It's
                                been running on and off about 12 hours
                                now and the resident
                                memory of the above process is already
                                at 7.5G and continues to grow
                                slowly. If I stop the test script the
                                memory stops growing,
                                but does not reduce. Restart the test
                                script and the memory begins
                                slowly growing again.

                                This is obviously a contrived app
                                environment. With my intended
                                application load it takes about a week
                                or so for the memory to get
                                high enough to invoke the oom killer.


                            Can you try debugging with the statedump
                            (https://gluster.readthedocs.io/en/latest/Troubleshooting/statedump/#read-a-statedump
                            <https://gluster.readthedocs.io/en/latest/Troubleshooting/statedump/#read-a-statedump>)
                            of
                            the fuse mount process and see what member
                            is leaking? Take the
                            statedumps in succession, maybe once
                            initially during the I/O and
                            once the memory gets high enough to hit the
                            OOM mark.
                            Share the dumps here.

                            Regards,
                            Ravi


                        Thanks for the reply. I noticed yesterday that
                        an update (3.12.5) had
                        been posted so I went ahead and updated and
                        repeated the test
                        overnight. The memory usage does not appear to
                        be growing as quickly
                        as is was with 3.12.4, but does still appear to
                        be growing.

                        I should also mention that there is another
                        process beyond my test app
                        that is reading the files from the volume.
                        Specifically, there is an
                        rsync that runs from the second node 2-4 times
                        an hour that reads from
                        the GlusterWWW volume mounted on node 1. Since
                        none of the files in
                        that mount are changing it doesn't actually
                        rsync anything, but
                        nonetheless it is running and reading the files
                        in addition to my test
                        script. (It's a part of my intended production
                        setup that I forgot was
                        still running.)

                        The mount process appears to be gaining memory
                        at a rate of about 1GB
                        every 4 hours or so. At that rate it'll take
                        several days before it
                        runs the box out of memory. But I took your
                        suggestion and made some
                        statedumps today anyway, about 2 hours apart, 4
                        total so far. It looks
                        like there may already be some actionable
                        information. These are the
                        only registers where the num_allocs have grown
                        with each of the four
                        samples:

                        [mount/fuse.fuse - usage-type gf_fuse_mt_gids_t
                        memusage]
                            ---> num_allocs at Fri Jan 26 08:57:31 2018: 784
                            ---> num_allocs at Fri Jan 26 10:55:50 2018: 831
                            ---> num_allocs at Fri Jan 26 12:55:15 2018: 877
                            ---> num_allocs at Fri Jan 26 14:58:27 2018: 908

                        [mount/fuse.fuse - usage-type
                        gf_common_mt_fd_lk_ctx_t memusage]
                            ---> num_allocs at Fri Jan 26 08:57:31 2018: 5
                            ---> num_allocs at Fri Jan 26 10:55:50 2018: 10
                            ---> num_allocs at Fri Jan 26 12:55:15 2018: 15
                            ---> num_allocs at Fri Jan 26 14:58:27 2018: 17

                        [cluster/distribute.GlusterWWW-dht - usage-type
                        gf_dht_mt_dht_layout_t
                        memusage]
                            ---> num_allocs at Fri Jan 26 08:57:31 2018:
                        24243596
                            ---> num_allocs at Fri Jan 26 10:55:50 2018:
                        27902622
                            ---> num_allocs at Fri Jan 26 12:55:15 2018:
                        30678066
                            ---> num_allocs at Fri Jan 26 14:58:27 2018:
                        33801036

                        Not sure the best way to get you the full dumps.
                        They're pretty big,
                        over 1G for all four. Also, I noticed some
                        filepath information in
                        there that I'd rather not share. What's the
                        recommended next step?


                Please run the following query on statedump files and
                report us the
                results:
                # grep itable <client-statedump> | grep active | wc -l
                # grep itable <client-statedump> | grep active_size
                # grep itable <client-statedump> | grep lru | wc -l
                # grep itable <client-statedump> | grep lru_size
                # grep itable <client-statedump> | grep purge | wc -l
                # grep itable <client-statedump> | grep purge_size


            Had to restart the test and have been running for 36 hours
            now. RSS is
            currently up to 23g.

            Working on getting a bug report with link to the dumps. In
            the mean
            time, I'm including the results of your above queries for
            the first
            dump, the 18 hour dump, and the 36 hour dump:

            # grep itable glusterdump.153904.dump.1517104561 | grep
            active | wc -l
            53865
            # grep itable glusterdump.153904.dump.1517169361 | grep
            active | wc -l
            53864
            # grep itable glusterdump.153904.dump.1517234161 | grep
            active | wc -l
            53864

            # grep itable glusterdump.153904.dump.1517104561 | grep
            active_size
            xlator.mount.fuse.itable.active_size=53864
            # grep itable glusterdump.153904.dump.1517169361 | grep
            active_size
            xlator.mount.fuse.itable.active_size=53863
            # grep itable glusterdump.153904.dump.1517234161 | grep
            active_size
            xlator.mount.fuse.itable.active_size=53863

            # grep itable glusterdump.153904.dump.1517104561 | grep lru
            | wc -l
            998510
            # grep itable glusterdump.153904.dump.1517169361 | grep lru
            | wc -l
            998510
            # grep itable glusterdump.153904.dump.1517234161 | grep lru
            | wc -l
            995992

            # grep itable glusterdump.153904.dump.1517104561 | grep lru_size
            xlator.mount.fuse.itable.lru_size=998508
            # grep itable glusterdump.153904.dump.1517169361 | grep lru_size
            xlator.mount.fuse.itable.lru_size=998508
            # grep itable glusterdump.153904.dump.1517234161 | grep lru_size
            xlator.mount.fuse.itable.lru_size=995990


        Around 1 million of inodes in lru table!! These are the inodes
        kernel has just cached and no operation is currently progress on
        these inodes. This could be the reason for high memory usage.
        We've a patch being worked on (merged on experimental branch
        currently) [1], that will help in these sceanrios. In the
        meantime can you remount glusterfs with options
        --entry-timeout=0 and --attribute-timeout=0? This will make sure
        that kernel won't cache inodes/attributes of the file and should
        bring down the memory usage.

        I am curious to know what is your data-set like? Is it the case
        of too many directories and files present in deep directories? I
        am wondering whether a significant number of inodes cached by
        kernel are there to hold dentry structure in kernel.

        [1] https://review.gluster.org/#/c/18665/
        <https://review.gluster.org/#/c/18665/>


    OK, remounted with your recommended attributes and repeated the
    test. Now the mount process looks like this:

    /usr/sbin/glusterfs --attribute-timeout=0 --entry-timeout=0
    --volfile-server=localhost --volfile-id=/GlusterWWW /var/www

    However after running for 36 hours it's again at about 23g (about
    the same place it was on the first test).

    A few metrics from the 36 hour mark:

    num_allocs for [cluster/distribute.GlusterWWW-dht - usage-type
    gf_dht_mt_dht_layout_t memusage] is 109140094. Seems at least
    somewhat similar to the original test, which had 117901593 at the 36
    hour mark.

    The dump file at the 36 hour mark had nothing for lru or lru_size.
    However, at the dump two hours prior it had:

    # grep itable glusterdump.67299.dump.1517493361 | grep lru | wc -l
    998510
    # grep itable glusterdump.67299.dump.1517493361 | grep lru_size
    xlator.mount.fuse.itable.lru_size=998508

    and the same thing for the dump four hours later. Are these values
    only relevant when the ls -R is actually running? I'm thinking the
    36 hour dump may have caught the ls -R between runs there (?)

    The data set is multiple Web sites. I know there's some litter there
    we can clean up, but I'd guess not more than 200-300k files or so.
    The biggest culprit is a single directory that we use as a
    multi-purpose file store, with filenames stored as GUIDs and linked
    to a DB. That directory currently has 500k+ files. Another directory
    serves a similar purpose and has about 66k files in it. The rest is
    generally distributed more "normally", I.E., a mixed nesting of
    directories and files.

    Cheers!

    Dan



            # grep itable glusterdump.153904.dump.1517104561 | grep
            purge | wc -l
            1
            # grep itable glusterdump.153904.dump.1517169361 | grep
            purge | wc -l
            1
            # grep itable glusterdump.153904.dump.1517234161 | grep
            purge | wc -l
            1

            # grep itable glusterdump.153904.dump.1517104561 | grep
            purge_size
            xlator.mount.fuse.itable.purge_size=0
            # grep itable glusterdump.153904.dump.1517169361 | grep
            purge_size
            xlator.mount.fuse.itable.purge_size=0
            # grep itable glusterdump.153904.dump.1517234161 | grep
            purge_size
            xlator.mount.fuse.itable.purge_size=0

            Cheers,

            Dan



                    I've CC'd the fuse/ dht devs to see if these data
                    types have potential
                    leaks. Could you raise a bug with the volume info
                    and a (dropbox?) link
                    from which we can download the dumps? You can
                    remove/replace the
                    filepaths from them.

                    Regards.
                    Ravi


                        Cheers!

                        Dan


                                Is there potentially something
                                misconfigured here?

                                I did see a reference to a memory leak
                                in another thread in this
                                list, but that had to do with the
                                setting of quotas, I don't have
                                any quotas set on my system.

                                Thanks,

                                Dan Ragle
                                daniel@xxxxxxxxxxxxxx

                                On 1/25/2018 11:04 AM, Dan Ragle wrote:

                                    Having a memory issue with Gluster
                                    3.12.4 and not sure how to
                                    troubleshoot. I don't *think* this
                                    is expected behavior. This is on an
                                    updated CentOS 7 box. The setup is a
                                    simple two node replicated layout
                                    where the two nodes act as both
                                    server and client. The volume in
                                    question: Volume Name: GlusterWWW
                                    Type: Replicate Volume ID:
                                    8e9b0e79-f309-4d9b-a5bb-45d065faaaa3
                                    Status: Started Snapshot Count: 0
                                    Number of Bricks: 1 x 2 = 2
                                    Transport-type: tcp Bricks: Brick1:
                                    vs1dlan.mydomain.com:/glusterfs_bricks/brick1/www
                                    Brick2:
                                    vs2dlan.mydomain.com:/glusterfs_bricks/brick1/www
                                    Options
                                    Reconfigured:
                                    nfs.disable: on
                                    cluster.favorite-child-policy: mtime
                                    transport.address-family: inet I had
                                    some other performance options in
                                    there, (increased cache-size, md
                                    invalidation, etc) but stripped them
                                    out in an attempt to isolate the
                                    issue. Still got the problem without
                                    them. The volume currently contains
                                    over 1M files. When mounting the
                                    volume, I get (among other things) a
                                    process as such:
                                    /usr/sbin/glusterfs
                                    --volfile-server=localhost
                                    --volfile-id=/GlusterWWW
                                    /var/www This process begins with
                                    little memory, but then as files are
                                    accessed in the volume the memory
                                    increases. I setup a script that
                                    simply reads the files in the volume
                                    one at a time (no writes). It's
                                    been running on and off about 12
                                    hours now and the resident memory of
                                    the above process is already at 7.5G
                                    and continues to grow slowly.
                                    If I
                                    stop the test script the memory
                                    stops growing, but does not reduce.
                                    Restart the test script and the
                                    memory begins slowly growing again.
                                    This
                                    is obviously a contrived app
                                    environment. With my intended
                                    application
                                    load it takes about a week or so for
                                    the memory to get high enough to
                                    invoke the oom killer. Is there
                                    potentially something misconfigured
                                    here? Thanks, Dan Ragle
                                    daniel@xxxxxxxxxxxxxx




                                    _______________________________________________
                                    Gluster-users mailing list
                                    Gluster-users@xxxxxxxxxxx
                                    <mailto:Gluster-users@gluster.org>
                                    http://lists.gluster.org/mailman/listinfo/gluster-users
                                    <http://lists.gluster.org/mailman/listinfo/gluster-users>

                                _______________________________________________
                                Gluster-users mailing list
                                Gluster-users@xxxxxxxxxxx
                                <mailto:Gluster-users@gluster.org>
                                http://lists.gluster.org/mailman/listinfo/gluster-users
                                <http://lists.gluster.org/mailman/listinfo/gluster-users>


                        _______________________________________________
                        Gluster-users mailing list
                        Gluster-users@xxxxxxxxxxx
                        <mailto:Gluster-users@gluster.org>
                        http://lists.gluster.org/mailman/listinfo/gluster-users
                        <http://lists.gluster.org/mailman/listinfo/gluster-users>







_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux