Re: Memory leak with a replica 3 arbiter 1 configuration

Benjamin Edgar <benedgar8@xxxxxxxxx> · Tue, 23 Aug 2016 16:42:58 -0400

My test servers have been running for about 3 hours now (with the while loop to constantly write and delete files) and it looks like the memory usage of the arbiter brick process has not increased in the past hour. Before it was constantly increasing, so it looks like adding the "GF_FREE (ctx->iattbuf);" line in arbiter.c fixed the issue. If anything changes overnight I will post an update, but I believe that the fix worked!
Once this patch makes it into the master branch, how long does it usually take to get released back to 3.8?

Thanks!
Ben

On Tue, Aug 23, 2016 at 2:18 PM, Benjamin Edgar <benedgar8@xxxxxxxxx> wrote:
Hi Ravi,
I saw that you updated the patch today (@ http://review.gluster.org/#/c/15289/). I built an RPM of the first iteration you had of the patch (just changing the one line in arbiter.c "GF_FREE (ctx->iattbuf);") and am running that on some test servers now to see if the memory of the arbiter brick gets out of control.

Ben

On Tue, Aug 23, 2016 at 3:38 AM, Ravishankar N <ravishankar@xxxxxxxxxx> wrote:

    Hi Benjamin

      On 08/23/2016 06:41 AM, Benjamin Edgar wrote:

      I've attached a statedump of the problem brick
        process.  Let me know if there are any other logs you need.

    Thanks for the report! I've sent a fix @
    http://review.gluster.org/#/c/15289/ . It would be nice if you can
    verify if the patch fixes the issue for you.

    Thanks,

    Ravi

        Thanks a lot,
        Ben

        On Mon, Aug 22, 2016 at 5:03 PM,
          Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> wrote:

                Could you collect statedump of the brick process by
                  following: https://gluster.readthedocs.io/en/latest/Troubleshooting/statedump

                That should help us identify which datatype is causing
                leaks and fix it.

              Thanks!

                  On Tue, Aug 23, 2016 at 2:22 AM,
                    Benjamin Edgar <benedgar8@xxxxxxxxx>
                    wrote:

                      Hi,

                        I appear to have a memory leak with a
                          replica 3 arbiter 1 configuration of gluster.
                          I have a data brick and an arbiter brick on
                          one server, and another server with the last
                          data brick. The more I write files to gluster
                          in this configuration, the more memory the
                          arbiter brick process takes up.

                        I am able to reproduce this issue by first
                          setting up a replica 3 arbiter 1 configuration
                          and then using the following bash script to
                          create 10,000 200kB files, delete those files,
                          and run forever:

                          while true ; do
                            for i in {1..10000} ; do
                              dd if=/dev/urandom bs=200K count=1
                            of=$TEST_FILES_DIR/file$i
                            done
                            rm -rf $TEST_FILES_DIR/*

                          done

                        $TEST_FILES_DIR is a location on my gluster
                          mount.

                        After about 3 days of this script running
                          on one of my clusters, this is what the output
                          of "top" looks like:

                            PID   USER      PR  NI    VIRT      
                            RES        SHR S   %CPU %MEM     TIME+      
                            COMMAND
                          16039 root          20   0     1397220
                             77720     3948 S   20.6    1.0          
                             860:01.53  glusterfsd
                          13174 root          20   0     1395824
                             112728   3692 S   19.6    1.5          
                             806:07.17  glusterfs
                          19961 root          20   0     2967204  2.145g 
                              3896 S   17.3    29.0          752:10.70
                             glusterfsd

                        As you can see one of the brick processes
                          is using over 2 gigabytes of memory.

                        One work-around for this is to kill the
                          arbiter brick process and restart the gluster
                          daemon. This restarts arbiter brick process
                          and its memory usage goes back down to a
                          reasonable level. However I would rather not
                          kill the arbiter brick every week for
                          production environments.

                        Has anyone seen this issue before and is
                          there a known work-around/fix?

                        Thanks,
                        Ben

                  _______________________________________________

                  Gluster-users mailing list

                  Gluster-users@xxxxxxxxxxx

                  http://www.gluster.org/mailman/listinfo/gluster-users

                  -- 

                    Pranith

        -- 

          Benjamin
              Edgar
            Computer
              Science
            University of
              Virginia 2015
            (571)
              338-0878

      _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

-- 
Benjamin EdgarComputer Science
University of Virginia 2015
(571) 338-0878

-- 
Benjamin EdgarComputer Science
University of Virginia 2015
(571) 338-0878

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users