Re: Memory leak with a replica 3 arbiter 1 configuration

Benjamin Edgar <benedgar8@xxxxxxxxx> · Tue, 23 Aug 2016 14:18:18 -0400

Hi Ravi,
I saw that you updated the patch today (@ http://review.gluster.org/#/c/15289/). I built an RPM of the first iteration you had of the patch (just changing the one line in arbiter.c "GF_FREE (ctx->iattbuf);") and am running that on some test servers now to see if the memory of the arbiter brick gets out of control.

Ben

On Tue, Aug 23, 2016 at 3:38 AM, Ravishankar N <ravishankar@xxxxxxxxxx> wrote:

    Hi Benjamin

      On 08/23/2016 06:41 AM, Benjamin Edgar wrote:

      I've attached a statedump of the problem brick
        process.  Let me know if there are any other logs you need.

    Thanks for the report! I've sent a fix @
    http://review.gluster.org/#/c/15289/ . It would be nice if you can
    verify if the patch fixes the issue for you.

    Thanks,

    Ravi

        Thanks a lot,
        Ben

        On Mon, Aug 22, 2016 at 5:03 PM,
          Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> wrote:

                Could you collect statedump of the brick process by
                  following: https://gluster.readthedocs.io/en/latest/Troubleshooting/statedump

                That should help us identify which datatype is causing
                leaks and fix it.

              Thanks!

                  On Tue, Aug 23, 2016 at 2:22 AM,
                    Benjamin Edgar <benedgar8@xxxxxxxxx>
                    wrote:

                      Hi,

                        I appear to have a memory leak with a
                          replica 3 arbiter 1 configuration of gluster.
                          I have a data brick and an arbiter brick on
                          one server, and another server with the last
                          data brick. The more I write files to gluster
                          in this configuration, the more memory the
                          arbiter brick process takes up.

                        I am able to reproduce this issue by first
                          setting up a replica 3 arbiter 1 configuration
                          and then using the following bash script to
                          create 10,000 200kB files, delete those files,
                          and run forever:

                          while true ; do
                            for i in {1..10000} ; do
                              dd if=/dev/urandom bs=200K count=1
                            of=$TEST_FILES_DIR/file$i
                            done
                            rm -rf $TEST_FILES_DIR/*

                          done

                        $TEST_FILES_DIR is a location on my gluster
                          mount.

                        After about 3 days of this script running
                          on one of my clusters, this is what the output
                          of "top" looks like:

                            PID   USER      PR  NI    VIRT      
                            RES        SHR S   %CPU %MEM     TIME+      
                            COMMAND
                          16039 root          20   0     1397220
                             77720     3948 S   20.6    1.0          
                             860:01.53  glusterfsd
                          13174 root          20   0     1395824
                             112728   3692 S   19.6    1.5          
                             806:07.17  glusterfs
                          19961 root          20   0     2967204  2.145g 
                              3896 S   17.3    29.0          752:10.70
                             glusterfsd

                        As you can see one of the brick processes
                          is using over 2 gigabytes of memory.

                        One work-around for this is to kill the
                          arbiter brick process and restart the gluster
                          daemon. This restarts arbiter brick process
                          and its memory usage goes back down to a
                          reasonable level. However I would rather not
                          kill the arbiter brick every week for
                          production environments.

                        Has anyone seen this issue before and is
                          there a known work-around/fix?

                        Thanks,
                        Ben

                  _______________________________________________

                  Gluster-users mailing list

                  Gluster-users@xxxxxxxxxxx

                  http://www.gluster.org/mailman/listinfo/gluster-users

                  -- 

                    Pranith

        -- 

          Benjamin
              Edgar
            Computer
              Science
            University of
              Virginia 2015
            (571)
              338-0878

      _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

-- 
Benjamin EdgarComputer Science
University of Virginia 2015
(571) 338-0878

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users