Re: 100% cpu on brick replication

Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> · Fri, 29 May 2015 13:46:52 +0530



    Could you give gluster volume info output?

    
    Pranith

    
    On 05/29/2015 01:18 PM, Pedro Oriani
      wrote:

    
      I've set 
        

          cluster.entry-self-heal: off
          

          Maybe I've missed, and when started the service on srv02
            seemed to do the job.
          then i've restarted the service.
          

          on srv02 
          

              11607 ?        Ssl    0:00 /usr/sbin/glusterfs -s
                localhost --volfile-id gluster/glustershd -p
                /var/lib/glusterd/glustershd/run/glustershd.pid -l
                /var/log/glusterfs/glustershd.log -S
                /var/run/gluster/eb93ca526d4559069efc40da9c71b3a4.socket
                --xlator-option
                *replicate*.node-uuid=7207ea30-41e9-4344-8fc3-47743b83629e
              11612 ?        Ssl    0:03 /usr/sbin/glusterfsd -s
                172.16.0.2 --volfile-id
                vol1.172.16.0.2.data-glusterfs-vol1-brick1-brick -p
                /var/lib/glusterd/vols/vol1/run/172.16.0.2-data-glusterfs-vol1-brick1-brick.pid
                -S
                /var/run/gluster/09285d60c2c8c9aa546602147a99a347.socket
                --brick-name /data/glusterfs/vol1/brick1/brick -l
                /var/log/glusterfs/bricks/data-glusterfs-vol1-brick1-brick.log
                --xlator-option
                *-posix.glusterd-uuid=7207ea30-41e9-4344-8fc3-47743b83629e
                --brick-port 49154 --xlator-option
                vol1-server.listen-port=49154
            
            
            it's seems like self healing starts and brings down
              srv01, with 600% load
          
          
          thanks,
          Pedro
          

Date: Fri, 29 May 2015 12:37:19 +0530

            From: pkarampu@xxxxxxxxxx

            To: sgunfio@xxxxxxxxxxx

            CC: Gluster-users@xxxxxxxxxxx

            Subject: Re:  100% cpu on brick replication

            
            On 05/29/2015 12:34 PM,
              Pedro Oriani wrote:

            
              Hi Pranith,
                

                it's
                      for sure related to a replication / healing task,
                      because occurses when you create a new replicated
                      brick or when you bring back online an old one.
                The
                      problem is that the cpu load on the online brick
                      is so high that I cannot do normal operations.
                In
                      my case when a replication / healing occurs, the
                      cluster cannot serve content.
                I'm
                      asking if there is a way to limit cpu usage in
                      this case, or set a less aggressive mode, because
                      otherwise I have to rethink the image repository.
              
            
            Disable self-heal. I see that you
              already did that for self-heal daemon. Lets do that even
              for mounts.

            gluster volume set <volname> cluster.entry-self-heal
            off

            
            Let me know how that goes.

            
            Pranith

            
                thanks,
                Pedro
                

                    Date: Fri, 29 May 2015
                    11:14:29 +0530

                    From: pkarampu@xxxxxxxxxx

                    To: sgunfio@xxxxxxxxxxx;
                    gluster-users@xxxxxxxxxxx

                    Subject: Re:  100% cpu on brick
                    replication

                    
                    On 05/27/2015 08:48
                      PM, Pedro Oriani wrote:

                    
                      Hi All,
                        I'm writing because I'm experiecing an
                          issue with gluster's replication feature.
                        I've a brick on srv1 with about 2TB of
                          mixed side files, ranging from 10k a 300k
                        When I add a new replication brick on srv2,
                          the glusterfs process take all the cpu.
                        This is unsuitable because the volume is
                          not responding at normal r/w queries.
                        

                        Glusterfs version is 3.7.0
                      
                    
                    Is it because of self-heals? Was the brick offline
                    until then?

                    
                    Pranith

                    
                        the underlaying volume is xfs.
                        

                          Volume Name: vol1
                          Type: Replicate
                          Volume ID: 
                          Status: Started
                          Number of Bricks: 1 x 2 = 2
                          Transport-type: tcp
                          Bricks:
                          Brick1:
                            172.16.0.1:/data/glusterfs/vol1/brick1/brick
                          Brick2:
                            172.16.0.2:/data/glusterfs/vol1/brick1/brick
                          Options Reconfigured:
                          performance.cache-size: 1gb
                          cluster.self-heal-daemon: off
                          cluster.data-self-heal-algorithm: full
                          cluster.metadata-self-heal: off
                          performance.cache-max-file-size: 2MB
                          performance.cache-refresh-timeout: 1
                          performance.stat-prefetch: off
                          performance.read-ahead: on
                          performance.quick-read: off
                          performance.write-behind-window-size: 4MB
                          performance.flush-behind: on
                          performance.write-behind: on
                          performance.io-thread-count: 32
                          performance.io-cache: on
                          network.ping-timeout: 2
                          nfs.addr-namelookup: off
                          performance.strict-write-ordering: on
                        
                        
                        there is any parameter or hint that I can
                          follow to limit cpu occupation to grant a
                          replication with few lag on normal operations
                          ?
                        

                        thank 
                      
                      
                      _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
                    
                    
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users