Fwd: [Gluster-devel] heal hanging

"Glomski, Patrick" <patrick.glomski@xxxxxxxxxxxxx> · Fri, 22 Jan 2016 12:48:03 -0500

---------- Forwarded message ----------
From: Glomski, Patrick <patrick.glomski@xxxxxxxxxxxxx>
Date: Fri, Jan 22, 2016 at 12:44 PM
Subject: Re:  [Gluster-devel] heal hanging
To: Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>

Hello, Pranith. I've attached the homegfs log from one of the clients. They were spitting out errors constantly until the mount was killed / remounted.

[2016-01-22 15:02:38.475946] E [iobuf.c:759:iobuf_unref] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x338de20580] (--> /usr/lib64/glusterfs/3.6.6/xlator/performance/quick-read.so(qr_readv_cached+0xb7)[0x7fef7dcbaea7] (--> /usr/lib64/glusterfs/3.6.6/xlator/performance/quick-read.so(qr_readv+0x62)[0x7fef7dcbb4c2] (--> /usr/lib64/libglusterfs.so.0(default_readv_resume+0x14d)[0x338de2a75d] (--> /usr/lib64/libglusterfs.so.0(call_resume+0x3d6)[0x338de424b6] ))))) 0-iobuf: invalid argument: iobuf

[2016-01-22 15:02:38.476161] E [iobuf.c:865:iobref_unref] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x338de20580] (--> /usr/lib64/glusterfs/3.6.6/xlator/performance/quick-read.so(qr_readv_cached+0xc1)[0x7fef7dcbaeb1] (--> /usr/lib64/glusterfs/3.6.6/xlator/performance/quick-read.so(qr_readv+0x62)[0x7fef7dcbb4c2] (--> /usr/lib64/libglusterfs.so.0(default_readv_resume+0x14d)[0x338de2a75d] (--> /usr/lib64/libglusterfs.so.0(call_resume+0x3d6)[0x338de424b6] ))))) 0-iobuf: invalid argument: iobref

[2016-01-22 15:02:57.705386] W [fuse-bridge.c:1261:fuse_err_cbk] 0-glusterfs-fuse: 340112471: FLUSH() ERR => -1 (No such file or directory)

Hopefully the client log can provide some indicator of the issue,

Patrick

On Thu, Jan 21, 2016 at 8:49 PM, Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> wrote:

    On 01/22/2016 07:13 AM, Glomski,
      Patrick wrote:

      We use the samba glusterfs virtual filesystem (the
        current version provided on download.gluster.org),
        but no windows clients connecting directly.

    Hmm.. Is there a way to disable using this and check if the CPU%
    still increases? What getxattr of "glusterfs.get_real_filename
    <filanme>" does is to scan the entire directory looking for
    strcasecmp(<filname>, <scanned-filename>). If anything
    matches then it will return the <scanned-filename>. But the
    problem is the scan is costly. So I wonder if this is the reason for
    the CPU spikes.

    Pranith

        On Thu, Jan 21, 2016 at 8:37 PM,
          Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> wrote:

             Do you have any
              windows clients? I see a lot of getxattr calls for
              "glusterfs.get_real_filename" which lead to full readdirs
              of the directories on the brick.

                  Pranith

                On 01/22/2016 12:51 AM, Glomski, Patrick wrote:

                      Pranith, could this kind of behavior be
                        self-inflicted by us deleting files directly
                        from the bricks? We have done that in the past
                        to clean up an issues where gluster wouldn't
                        allow us to delete from the mount.

                        If so, is it feasible to clean them up by
                        running a search on the .glusterfs directories
                        directly and removing files with a reference
                        count of 1 that are non-zero size (or directly
                        checking the xattrs to be sure that it's not a
                        DHT link). 

                        find /data/brick01a/homegfs/.glusterfs -type f
                        -not -empty -links -2 -exec rm -f "{}" \;

                      Is there anything I'm inherently missing with that
                      approach that will further corrupt the system?

                      On Thu, Jan 21, 2016 at
                        1:02 PM, Glomski, Patrick <patrick.glomski@xxxxxxxxxxxxx>
                        wrote:

                              Load spiked again: ~1200%cpu on
                                gfs02a for glusterfsd. Crawl has been
                                running on one of the bricks on gfs02b
                                for 25 min or so and users cannot access
                                the volume.

                                I re-listed the xattrop directories as
                                well as a 'top' entry and heal
                                statistics. Then I restarted the gluster
                                services on gfs02a. 

                                =================== top
                                ===================

                                PID USER      PR  NI  VIRT  RES  SHR S
                                %CPU %MEM    TIME+ 
                                COMMAND                                                

                                 8969 root      20   0 2815m 204m 3588 S
                                1181.0  0.6 591:06.93 glusterfsd        

                                =================== xattrop
                                ===================

/data/brick01a/homegfs/.glusterfs/indices/xattrop:

                                xattrop-41f19453-91e4-437c-afa9-3b25614de210 

xattrop-9b815879-2f4d-402b-867c-a6d65087788c

/data/brick02a/homegfs/.glusterfs/indices/xattrop:

                                xattrop-70131855-3cfb-49af-abce-9d23f57fb393 

xattrop-dfb77848-a39d-4417-a725-9beca75d78c6

/data/brick01b/homegfs/.glusterfs/indices/xattrop:

                                e6e47ed9-309b-42a7-8c44-28c29b9a20f8         

xattrop-5c797a64-bde7-4eac-b4fc-0befc632e125

                                xattrop-38ec65a1-00b5-4544-8a6c-bf0f531a1934 

xattrop-ef0980ad-f074-4163-979f-16d5ef85b0a0

/data/brick02b/homegfs/.glusterfs/indices/xattrop:

                                xattrop-7402438d-0ee7-4fcf-b9bb-b561236f99bc 

xattrop-8ffbf5f7-ace3-497d-944e-93ac85241413

/data/brick01a/homegfs/.glusterfs/indices/xattrop:

xattrop-0115acd0-caae-4dfd-b3b4-7cc42a0ff531

/data/brick02a/homegfs/.glusterfs/indices/xattrop:

xattrop-7e20fdb1-5224-4b9a-be06-568708526d70

/data/brick01b/homegfs/.glusterfs/indices/xattrop:

                                8034bc06-92cd-4fa5-8aaf-09039e79d2c8 
                                c9ce22ed-6d8b-471b-a111-b39e57f0b512

                                94fa1d60-45ad-4341-b69c-315936b51e8d 
                                xattrop-9c04623a-64ce-4f66-8b23-dbaba49119c7

/data/brick02b/homegfs/.glusterfs/indices/xattrop:

xattrop-b8c8f024-d038-49a2-9a53-c54ead09111d

                                =================== heal stats
                                ===================

                                homegfs [b0-gfsib01a] : Starting time of
                                crawl       : Thu Jan 21 12:36:45 2016

                                homegfs [b0-gfsib01a] : Ending time of
                                crawl         : Thu Jan 21 12:36:45 2016

                                homegfs [b0-gfsib01a] : Type of crawl:
                                INDEX

                                homegfs [b0-gfsib01a] : No. of entries
                                healed        : 0

                                homegfs [b0-gfsib01a] : No. of entries
                                in split-brain: 0

                                homegfs [b0-gfsib01a] : No. of heal
                                failed entries   : 0

                                homegfs [b1-gfsib01b] : Starting time of
                                crawl       : Thu Jan 21 12:36:19 2016

                                homegfs [b1-gfsib01b] : Ending time of
                                crawl         : Thu Jan 21 12:36:19 2016

                                homegfs [b1-gfsib01b] : Type of crawl:
                                INDEX

                                homegfs [b1-gfsib01b] : No. of entries
                                healed        : 0

                                homegfs [b1-gfsib01b] : No. of entries
                                in split-brain: 0

                                homegfs [b1-gfsib01b] : No. of heal
                                failed entries   : 1

                                homegfs [b2-gfsib01a] : Starting time of
                                crawl       : Thu Jan 21 12:36:48 2016

                                homegfs [b2-gfsib01a] : Ending time of
                                crawl         : Thu Jan 21 12:36:48 2016

                                homegfs [b2-gfsib01a] : Type of crawl:
                                INDEX

                                homegfs [b2-gfsib01a] : No. of entries
                                healed        : 0

                                homegfs [b2-gfsib01a] : No. of entries
                                in split-brain: 0

                                homegfs [b2-gfsib01a] : No. of heal
                                failed entries   : 0

                                homegfs [b3-gfsib01b] : Starting time of
                                crawl       : Thu Jan 21 12:36:47 2016

                                homegfs [b3-gfsib01b] : Ending time of
                                crawl         : Thu Jan 21 12:36:47 2016

                                homegfs [b3-gfsib01b] : Type of crawl:
                                INDEX

                                homegfs [b3-gfsib01b] : No. of entries
                                healed        : 0

                                homegfs [b3-gfsib01b] : No. of entries
                                in split-brain: 0

                                homegfs [b3-gfsib01b] : No. of heal
                                failed entries   : 0

                                homegfs [b4-gfsib02a] : Starting time of
                                crawl       : Thu Jan 21 12:36:06 2016

                                homegfs [b4-gfsib02a] : Ending time of
                                crawl         : Thu Jan 21 12:36:06 2016

                                homegfs [b4-gfsib02a] : Type of crawl:
                                INDEX

                                homegfs [b4-gfsib02a] : No. of entries
                                healed        : 0

                                homegfs [b4-gfsib02a] : No. of entries
                                in split-brain: 0

                                homegfs [b4-gfsib02a] : No. of heal
                                failed entries   : 0

                                homegfs [b5-gfsib02b] : Starting time of
                                crawl       : Thu Jan 21 12:13:40 2016

                                homegfs [b5-gfsib02b]
                                :                                ***
                                Crawl is in progress ***

                                homegfs [b5-gfsib02b] : Type of crawl:
                                INDEX

                                homegfs [b5-gfsib02b] : No. of entries
                                healed        : 0

                                homegfs [b5-gfsib02b] : No. of entries
                                in split-brain: 0

                                homegfs [b5-gfsib02b] : No. of heal
                                failed entries   : 0

                                homegfs [b6-gfsib02a] : Starting time of
                                crawl       : Thu Jan 21 12:36:58 2016

                                homegfs [b6-gfsib02a] : Ending time of
                                crawl         : Thu Jan 21 12:36:58 2016

                                homegfs [b6-gfsib02a] : Type of crawl:
                                INDEX

                                homegfs [b6-gfsib02a] : No. of entries
                                healed        : 0

                                homegfs [b6-gfsib02a] : No. of entries
                                in split-brain: 0

                                homegfs [b6-gfsib02a] : No. of heal
                                failed entries   : 0

                                homegfs [b7-gfsib02b] : Starting time of
                                crawl       : Thu Jan 21 12:36:50 2016

                                homegfs [b7-gfsib02b] : Ending time of
                                crawl         : Thu Jan 21 12:36:50 2016

                                homegfs [b7-gfsib02b] : Type of crawl:
                                INDEX

                                homegfs [b7-gfsib02b] : No. of entries
                                healed        : 0

                                homegfs [b7-gfsib02b] : No. of entries
                                in split-brain: 0

                                homegfs [b7-gfsib02b] : No. of heal
                                failed entries   : 0

========================================================================================

                              I waited a few minutes for the heals to
                              finish and ran the heal statistics and
                              info again. one file is in split-brain.
                              Aside from the split-brain, the load on
                              all systems is down now and they are
                              behaving normally. glustershd.log is
                              attached. What is going on??? 

                              Thu Jan 21 12:53:50 EST 2016

                              =================== homegfs
                              ===================

                              homegfs [b0-gfsib01a] : Starting time of
                              crawl       : Thu Jan 21 12:53:02 2016

                              homegfs [b0-gfsib01a] : Ending time of
                              crawl         : Thu Jan 21 12:53:02 2016

                              homegfs [b0-gfsib01a] : Type of crawl:
                              INDEX

                              homegfs [b0-gfsib01a] : No. of entries
                              healed        : 0

                              homegfs [b0-gfsib01a] : No. of entries in
                              split-brain: 0

                              homegfs [b0-gfsib01a] : No. of heal failed
                              entries   : 0

                              homegfs [b1-gfsib01b] : Starting time of
                              crawl       : Thu Jan 21 12:53:38 2016

                              homegfs [b1-gfsib01b] : Ending time of
                              crawl         : Thu Jan 21 12:53:38 2016

                              homegfs [b1-gfsib01b] : Type of crawl:
                              INDEX

                              homegfs [b1-gfsib01b] : No. of entries
                              healed        : 0

                              homegfs [b1-gfsib01b] : No. of entries in
                              split-brain: 0

                              homegfs [b1-gfsib01b] : No. of heal failed
                              entries   : 1

                              homegfs [b2-gfsib01a] : Starting time of
                              crawl       : Thu Jan 21 12:53:04 2016

                              homegfs [b2-gfsib01a] : Ending time of
                              crawl         : Thu Jan 21 12:53:04 2016

                              homegfs [b2-gfsib01a] : Type of crawl:
                              INDEX

                              homegfs [b2-gfsib01a] : No. of entries
                              healed        : 0

                              homegfs [b2-gfsib01a] : No. of entries in
                              split-brain: 0

                              homegfs [b2-gfsib01a] : No. of heal failed
                              entries   : 0

                              homegfs [b3-gfsib01b] : Starting time of
                              crawl       : Thu Jan 21 12:53:04 2016

                              homegfs [b3-gfsib01b] : Ending time of
                              crawl         : Thu Jan 21 12:53:04 2016

                              homegfs [b3-gfsib01b] : Type of crawl:
                              INDEX

                              homegfs [b3-gfsib01b] : No. of entries
                              healed        : 0

                              homegfs [b3-gfsib01b] : No. of entries in
                              split-brain: 0

                              homegfs [b3-gfsib01b] : No. of heal failed
                              entries   : 0

                              homegfs [b4-gfsib02a] : Starting time of
                              crawl       : Thu Jan 21 12:53:33 2016

                              homegfs [b4-gfsib02a] : Ending time of
                              crawl         : Thu Jan 21 12:53:33 2016

                              homegfs [b4-gfsib02a] : Type of crawl:
                              INDEX

                              homegfs [b4-gfsib02a] : No. of entries
                              healed        : 0

                              homegfs [b4-gfsib02a] : No. of entries in
                              split-brain: 0

                              homegfs [b4-gfsib02a] : No. of heal failed
                              entries   : 1

                              homegfs [b5-gfsib02b] : Starting time of
                              crawl       : Thu Jan 21 12:53:14 2016

                              homegfs [b5-gfsib02b] : Ending time of
                              crawl         : Thu Jan 21 12:53:15 2016

                              homegfs [b5-gfsib02b] : Type of crawl:
                              INDEX

                              homegfs [b5-gfsib02b] : No. of entries
                              healed        : 0

                              homegfs [b5-gfsib02b] : No. of entries in
                              split-brain: 0

                              homegfs [b5-gfsib02b] : No. of heal failed
                              entries   : 3

                              homegfs [b6-gfsib02a] : Starting time of
                              crawl       : Thu Jan 21 12:53:04 2016

                              homegfs [b6-gfsib02a] : Ending time of
                              crawl         : Thu Jan 21 12:53:04 2016

                              homegfs [b6-gfsib02a] : Type of crawl:
                              INDEX

                              homegfs [b6-gfsib02a] : No. of entries
                              healed        : 0

                              homegfs [b6-gfsib02a] : No. of entries in
                              split-brain: 0

                              homegfs [b6-gfsib02a] : No. of heal failed
                              entries   : 0

                              homegfs [b7-gfsib02b] : Starting time of
                              crawl       : Thu Jan 21 12:53:09 2016

                              homegfs [b7-gfsib02b] : Ending time of
                              crawl         : Thu Jan 21 12:53:09 2016

                              homegfs [b7-gfsib02b] : Type of crawl:
                              INDEX

                              homegfs [b7-gfsib02b] : No. of entries
                              healed        : 0

                              homegfs [b7-gfsib02b] : No. of entries in
                              split-brain: 0

                              homegfs [b7-gfsib02b] : No. of heal failed
                              entries   : 0

                              *** gluster bug in 'gluster volume heal
                              homegfs statistics'   ***

                              *** Use 'gluster volume heal homegfs info'
                              until bug is fixed ***

                                Brick
                                gfs01a.corvidtec.com:/data/brick01a/homegfs/

                                Number of entries: 0

                                Brick
                                gfs01b.corvidtec.com:/data/brick01b/homegfs/

                                Number of entries: 0

                                Brick
                                gfs01a.corvidtec.com:/data/brick02a/homegfs/

                                Number of entries: 0

                                Brick
                                gfs01b.corvidtec.com:/data/brick02b/homegfs/

                                Number of entries: 0

                                Brick
                                gfs02a.corvidtec.com:/data/brick01a/homegfs/

                              /users/bangell/.gconfd - Is in
                              split-brain

                              Number of entries: 1

                              Brick
                              gfs02b.corvidtec.com:/data/brick01b/homegfs/

                              /users/bangell/.gconfd - Is in split-brain

                              /users/bangell/.gconfd/saved_state 

                              Number of entries: 2

                                Brick
                                gfs02a.corvidtec.com:/data/brick02a/homegfs/

                                Number of entries: 0

                                Brick
                                gfs02b.corvidtec.com:/data/brick02b/homegfs/

                                Number of entries: 0

                                On Thu, Jan 21,
                                  2016 at 11:10 AM, Pranith Kumar
                                  Karampuri <pkarampu@xxxxxxxxxx>
                                  wrote:

                                        On 01/21/2016 09:26 PM,
                                          Glomski, Patrick wrote:

                                            I should mention that
                                              the problem is not
                                              currently occurring and
                                              there are no heals (output
                                              appended). By restarting
                                              the gluster services, we
                                              can stop the crawl, which
                                              lowers the load for a
                                              while. Subsequent crawls
                                              seem to finish properly.
                                              For what it's worth,
                                              files/folders that show up
                                              in the 'volume info'
                                              output during a hung crawl
                                              don't seem to be anything
                                              out of the ordinary. 

                                              Over the past four days,
                                              the typical time before
                                              the problem recurs after
                                              suppressing it in this
                                              manner is an hour. Last
                                              night when we reached out
                                              to you was the last time
                                              it happened and the load
                                              has been low since (a
                                              relief).  David believes
                                              that recursively listing
                                              the files (ls -alR or
                                              similar) from a client
                                              mount can force the issue
                                              to happen, but obviously
                                              I'd rather not unless we
                                              have some precise thing
                                              we're looking for. Let me
                                              know if you'd like me to
                                              attempt to drive the
                                              system unstable like that
                                              and what I should look
                                              for. As it's a production
                                              system, I'd rather not
                                              leave it in this state for
                                              long.

                                       Will it be possible to
                                      send glustershd, mount logs of the
                                      past 4 days? I would like to see
                                      if this is because of directory
                                      self-heal going wild (Ravi is
                                      working on throttling feature for
                                      3.8, which will allow to put
                                      breaks on self-heal traffic)

                                          Pranith

                                              [root@gfs01a
                                                xattrop]# gluster volume
                                                heal homegfs info

                                                Brick
                                                gfs01a.corvidtec.com:/data/brick01a/homegfs/

                                                Number of entries: 0

                                                Brick
                                                gfs01b.corvidtec.com:/data/brick01b/homegfs/

                                                Number of entries: 0

                                                Brick
                                                gfs01a.corvidtec.com:/data/brick02a/homegfs/

                                                Number of entries: 0

                                                Brick
                                                gfs01b.corvidtec.com:/data/brick02b/homegfs/

                                                Number of entries: 0

                                                Brick
                                                gfs02a.corvidtec.com:/data/brick01a/homegfs/

                                                Number of entries: 0

                                                Brick
                                                gfs02b.corvidtec.com:/data/brick01b/homegfs/

                                                Number of entries: 0

                                                Brick
                                                gfs02a.corvidtec.com:/data/brick02a/homegfs/

                                                Number of entries: 0

                                                Brick
                                                gfs02b.corvidtec.com:/data/brick02b/homegfs/

                                                Number of entries: 0

                                              On
                                                Thu, Jan 21, 2016 at
                                                10:40 AM, Pranith Kumar
                                                Karampuri <pkarampu@xxxxxxxxxx>
                                                wrote:

                                                      On 01/21/2016
                                                        08:25 PM,
                                                        Glomski, Patrick
                                                        wrote:

                                                          Hello,
                                                          Pranith. The
                                                          typical
                                                          behavior is
                                                          that the %cpu
                                                          on a
                                                          glusterfsd
                                                          process jumps
                                                          to number of
                                                          processor
                                                          cores
                                                          available
                                                          (800% or
                                                          1200%,
                                                          depending on
                                                          the pair of
                                                          nodes
                                                          involved) and
                                                          the load
                                                          average on the
                                                          machine goes
                                                          very high
                                                          (~20). The
                                                          volume's heal
                                                          statistics
                                                          output shows
                                                          that it is
                                                          crawling one
                                                          of the bricks
                                                          and trying to
                                                          heal, but this
                                                          crawl hangs
                                                          and never
                                                          seems to
                                                          finish.

                                                          The number of
                                                          files in the
                                                          xattrop
                                                          directory
                                                          varies over
                                                          time, so I ran
                                                          a wc -l as you
                                                          requested
                                                          periodically
                                                          for some time
                                                          and then
                                                          started
                                                          including a
                                                          datestamped
                                                          list of the
                                                          files that
                                                          were in the
                                                          xattrops
                                                          directory on
                                                          each brick to
                                                          see which were
                                                          persistent.
                                                          All bricks had
                                                          files in the
                                                          xattrop
                                                          folder, so all
                                                          results are
                                                          attached.

                                                     Thanks this
                                                    info is helpful. I
                                                    don't see a lot of
                                                    files. Could you
                                                    give output of
                                                    "gluster volume heal
                                                    <volname>
                                                    info"? Is there any
                                                    directory in there
                                                    which is LARGE?

                                                        Pranith

                                                          Please
                                                          let me know if
                                                          there is
                                                          anything else
                                                          I can provide.

                                                          Patrick

                                                          On

                                                          Thu, Jan 21,
                                                          2016 at 12:01
                                                          AM, Pranith
                                                          Kumar
                                                          Karampuri <pkarampu@xxxxxxxxxx>
                                                          wrote:

                                                          hey,

                                                                 Which
                                                          process is
                                                          consuming so
                                                          much cpu? I
                                                          went through
                                                          the logs you
                                                          gave me. I see
                                                          that the
                                                          following
                                                          files are in
                                                          gfid mismatch
                                                          state:

<066e4525-8f8b-43aa-b7a1-86bbcecc68b9/safebrowsing-backup>,

<1d48754b-b38c-403d-94e2-0f5c41d5f885/recovery.bak>,

<ddc92637-303a-4059-9c56-ab23b1bb6ae9/patch0008.cnvrg>,

                                                          Could you give
                                                          me the output
                                                          of "ls
                                                          <brick-path>/indices/xattrop
                                                          | wc -l"
                                                          output on all
                                                          the bricks
                                                          which are
                                                          acting this
                                                          way? This will
                                                          tell us the
                                                          number of
                                                          pending
                                                          self-heals on
                                                          the system.

                                                          Pranith

                                                          On
                                                          01/20/2016
                                                          09:26 PM,
                                                          David Robinson
                                                          wrote:

                                                          resending
                                                          with parsed
                                                          logs... 

                                                          I am
                                                          having issues
                                                          with 3.6.6
                                                          where the load
                                                          will spike up
                                                          to 800% for
                                                          one of the
                                                          glusterfsd
                                                          processes and
                                                          the users can
                                                          no longer
                                                          access the
                                                          system.  If I
                                                          reboot the
                                                          node, the heal
                                                          will finish
                                                          normally after
                                                          a few minutes
                                                          and the system
                                                          will be
                                                          responsive,
                                                          but a few
                                                          hours later
                                                          the issue will
                                                          start again. 
                                                          It look like
                                                          it is hanging
                                                          in a heal and
                                                          spinning up
                                                          the load on
                                                          one of the
                                                          bricks.  The
                                                          heal gets
                                                          stuck and says
                                                          it is crawling
                                                          and never
                                                          returns. 
                                                          After a few
                                                          minutes of the
                                                          heal saying it
                                                          is crawling,
                                                          the load
                                                          spikes up and
                                                          the mounts
                                                          become
                                                          unresponsive.

                                                          Any
                                                          suggestions on
                                                          how to fix
                                                          this?  It has
                                                          us stopped
                                                          cold as the
                                                          user can no
                                                          longer access
                                                          the systems
                                                          when the load
                                                          spikes... Logs
                                                          attached.

                                                          System
                                                          setup info is:

                                                          [root@gfs01a

                                                          ~]# gluster
                                                          volume info
                                                          homegfs

                                                          Volume Name:
                                                          homegfs

                                                          Type:
                                                          Distributed-Replicate

                                                          Volume ID:
                                                          1e32672a-f1b7-4b58-ba94-58c085e59071

                                                          Status:
                                                          Started

                                                          Number of
                                                          Bricks: 4 x 2
                                                          = 8

                                                          Transport-type:

                                                          tcp

                                                          Bricks:

                                                          Brick1:
                                                          gfsib01a.corvidtec.com:/data/brick01a/homegfs

                                                          Brick2:
                                                          gfsib01b.corvidtec.com:/data/brick01b/homegfs

                                                          Brick3:
                                                          gfsib01a.corvidtec.com:/data/brick02a/homegfs

                                                          Brick4:
                                                          gfsib01b.corvidtec.com:/data/brick02b/homegfs

                                                          Brick5:
                                                          gfsib02a.corvidtec.com:/data/brick01a/homegfs

                                                          Brick6:
                                                          gfsib02b.corvidtec.com:/data/brick01b/homegfs

                                                          Brick7:
                                                          gfsib02a.corvidtec.com:/data/brick02a/homegfs

                                                          Brick8:
                                                          gfsib02b.corvidtec.com:/data/brick02b/homegfs

                                                          Options
                                                          Reconfigured:

                                                          performance.io-thread-count:

                                                          32

                                                          performance.cache-size:

                                                          128MB

                                                          performance.write-behind-window-size:

                                                          128MB

                                                          server.allow-insecure:

                                                          on

                                                          network.ping-timeout:

                                                          42

                                                          storage.owner-gid:

                                                          100

                                                          geo-replication.indexing:

                                                          off

                                                          geo-replication.ignore-pid-check:

                                                          on

                                                          changelog.changelog:

                                                          off

                                                          changelog.fsync-interval:

                                                          3

                                                          changelog.rollover-time:

                                                          15

                                                          server.manage-gids:

                                                          on

                                                          diagnostics.client-log-level:

                                                          WARNING

                                                          [root@gfs01a

                                                          ~]# rpm -qa |
                                                          grep gluster

gluster-nagios-common-0.1.1-0.el6.noarch

glusterfs-fuse-3.6.6-1.el6.x86_64

glusterfs-debuginfo-3.6.6-1.el6.x86_64

glusterfs-libs-3.6.6-1.el6.x86_64

glusterfs-geo-replication-3.6.6-1.el6.x86_64

glusterfs-api-3.6.6-1.el6.x86_64

glusterfs-devel-3.6.6-1.el6.x86_64

glusterfs-api-devel-3.6.6-1.el6.x86_64

glusterfs-3.6.6-1.el6.x86_64

glusterfs-cli-3.6.6-1.el6.x86_64

glusterfs-rdma-3.6.6-1.el6.x86_64

samba-vfs-glusterfs-4.1.11-2.el6.x86_64

glusterfs-server-3.6.6-1.el6.x86_64

glusterfs-extra-xlators-3.6.6-1.el6.x86_64

                                                          _______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________

                                                          Gluster-users
                                                          mailing list

                                                          Gluster-users@xxxxxxxxxxx

                                                          http://www.gluster.org/mailman/listinfo/gluster-users

Attachment:
homegfs.log.sample.xz

Description: application/xz
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users