Re: glusterfs and glusterfsd process utilization extremely high

Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> · Sat, 22 Nov 2014 23:54:42 +0530



    On 11/22/2014 11:50 PM, Kyle Harris
      wrote:

    
      Hi Pranith,
        

        Thank you very much for the quick reply and the
          information.  I am in the process now of recreating the
          cluster using XFS.  This all brings up a few questions:
        

        - I assume the change from EXT4 to XFS will correct the
          problem with readdir (in other words, the issue is not present
          in XFS)?
      
    
    Yes. This particular readdir issue is present because of the way
    gluster is handling EXT4's 64 bit offsets in readdir.

    
        - Do you have any idea when the patch for this might be
          out?  My reason for asking is that I have another cluster that
          has been updated to 3.6 and is running on EXT4 but does not
          yet have an issue.  This concerns me so I am hoping the patch
          will be out soon?
      
    
    Patch is out, but we need to wait for next release. Let me talk to
    Vijay once and see if we can make it quickly.

    
        - What exactly does cluster.entry-self-heal do?  I can't
          seem to find a description of it?
      
    
    It enables/disables directory self-heal.

    
        - I assume from your posts that the reason the cluster is
          fine until traffic hits it is because the self-heal is not
          happening until traffic causes the files to be read.  Is that
          how it works?
      
    
    Yes.

    
        Thank you again for the fast response and the great
          product!
        

        ----
        Kyle
      
      
        On Sat, Nov 22, 2014 at 11:36 AM,
          Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> wrote:

          
                On 11/22/2014 11:04 PM, Pranith Kumar Karampuri
                  wrote:

                
                  On 11/22/2014 10:40 PM, Pranith Kumar Karampuri
                    wrote:

                  
                    On 11/22/2014 10:29 PM, Kyle Harris wrote:

                    
                        Hello,
                        

                        I have an issue with a 3 node replicated
                          cluster.  My issue started after reboot a
                          while back.  The top command would show the
                          glusterfs and glusterfsd processes eating up
                          almost all the resources on an all three nodes
                          of the cluster.  So much so that it would not
                          run the web sites that are hosted on it.  The
                          httpd processes would begin to hang.  I
                          finally decided to tear down the cluster and
                          rebuild it from the ground up.  I did so and
                          then copied all the data back which took all
                          night due to the amount of data.  All was well
                          during that entire copy process back to the
                          cluster with no resource spikes.

                        
                  Assuming you go back to 3.5.2

                  Execute the following commands:

                  # gluster volume set <volname>
                  cluster.entry-self-heal off

                  
                  This should prevent httpd hangs.

                  
                  If you still find that the CPU usage is very high,
                  execute the following command:

                  # gluster volume set <volname>
                  cluster.self-heal-daemon off

                  
                  This disables self-healing. But you should probably
                  periodically heal so that the data is healed by
                  enabling self-heal-daemon using following command:

                  # gluster volume set <volname>
                  cluster.self-heal-daemon on

                  
                  Once "gluster volume heal <volname> info" shows
                  zero entries, then healing is complete.

                  
                  We took some steps to improve this in 3.6. But readdir
                  in EXT4 is not working correctly so that is probably
                  giving problems here. Lets wait for Vijay to merge the
                  patch I mentioned, then things should be fine.

                
               Sorry for the inconvenience caused. We found the
              issue after the release is made :-(.

                  
                  Pranith
              
                
                    Pranith

                    
                          I should note that this cluster is home
                            to many Apache/PHP based web sites.  The
                            problem starts again, however the minute I
                            point traffic back to the sites on the
                            cluster.  Before pointing traffic to it, all
                            is fine but as soon as the traffic begins to
                            hit it, the utilization again begins to
                            spike.  Note that all the sites run just
                            fine when hosted from a standard EXT4
                            partition.  I noticed another thread labeled
                            "glusterfsd process thrashing CPU" where
                            Pranith asks if the user has directories
                            with lots of files and I do.
                          

                          Here are some other details of my
                            cluster:
                          - OS:  CentOS 6.6 with all updates on all
                            3 nodes as of 11-22-2014
                          - All 3 nodes have 8 cores with 16 GB of
                            RAM
                          - Nodes are all formatted with EXT4
                          - All three nodes also have the files
                            systems mounted on them for use with
                            Apache.  I have experimented with both NFS
                            and Fuse mounts and it doesn't seem to make
                            a difference which I use for this particular
                            problem.  I am currently using Fuse.
                          - Approximately 135 GB of data.  Some
                            deep directories with many small files.
                          - No optimization or changes have been
                            made to the cluster . . . it is running with
                            default options
                          - Gluster version 3.6.1-1 installed from
                            RPMs
                          - Note the issue originally occurred on
                            version 3.5.2 but I updated before
                            rebuilding it in hopes that would fix it (it
                            didn't)
                          

                          Can anyone give me guidance on how to
                            tackle this problem?  I am hoping perhaps
                            Pranith can give some details as to why the
                            question about many files and how to proceed
                            given my situation.  I know others have
                            commented about having many small files with
                            regard to performance but when the
                            processors are not spiked, performance has
                            been acceptable.  Any help would be greatly
                            appreciated.
                          

                      Kyle,

                            3.6.1 and EXT4 has a problem because of 64
                      bits offset. Afr-v2 implementation introduced this
                      problem. We thought the following patch is merged
                      but it didn't :-( http://review.gluster.com/8201.
                      Please don't use 3.6.1 with EXT4

                      
                      Vijay,

                            Please merge http://review.gluster.com/8201

                      
                      Pranith

                      
                        -- 

                          
                            Kyle 
                              
                                
                        _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users
                      
                      
                    _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users
                  
                  
        -- 

        
          Kyle A. Harris
            Kyle@xxxxxxxxxxxxxxxxx

              615-364-6752

              
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users