Re: Performance problem with XFS

Xavier Hernandez <xhernandez@xxxxxxxxxx> · Wed, 27 Mar 2013 12:30:49 +0100



    I can't combine -Tt with -c, it only
      shows the final report but not the time consumed on each call.
      Also, -c flag shows system CPU time for the ls process, not wall
      clock time. Values using -c seem quite normal.

      
      Using -Tt it reports the wall clock time for each call. I
      summarized the results in a table, attached to the email.

      
      I've also included a detailed list of the system calls made by ls
      sorted by time.

      
      Xavi

      
      Al 26/03/13 19:49, En/na Anand Avati ha escrit:

    
    Can you run ls as 'strace -Ttc ls' in each of the
      three runs to compare the output of first and third run to see
      where most of the time is getting spent?
      

      Avati

        
        On Tue, Mar 26, 2013 at 11:01 AM,
          Xavier Hernandez <xhernandez@xxxxxxxxxx>
          wrote:

          Hi,

            
            since one of the improvements seemed to be the reduction of
            the number of directories inside .glusterfs I've made a
            modification to storage/posix so that instead of creating 2
            levels of 256 directories each, I create 4 levels of 16
            directories.

            
            With this change, the first and second ls take 0.9 seconds;
            the third 9.

            
            I don't know what causes such slowness on the third ls,
            however the second ls has improved a lot.

            
            Any one has some advice ?

            
            Is there any way to improve this ? some tweak of the
            kernel/xfs/gluster ?

            
            Thanks,

            
            Xavi

            
            Al 26/03/13 11:02, En/na Xavier Hernandez ha escrit:
            
              
                  Hi,

                  
                  I've reproduced a problem I've seen with directory
                  listing of directories not accessed for a long time
                  (some hours). Gluster version is 3.3.1.

                  
                  I've made the tests with different hardware and the
                  behavior is quite similar.

                  
                  The problem can be clearly seen doing this:

                  
                  1. Format bricks with XFS, inode size 512, and mount
                  them

                  2. Create a gluster volume (I've tried several
                  combinations, see later)

                  3. Start and mount it

                  4. Create a directory <vol>/dirs and fill it
                  with 300 subdirectories

                  5. Unmount the volume, stop it and flush kernel caches
                  of all servers (sync ; echo 3 >
                  /proc/sys/vm/drop_caches)

                  6. Start the volume, mount it, and execute "time ls -l
                  <vol>/dirs | wc -l"

                  7. Create 80.000 directories at <vol>/ (notice
                  that these directories are not created inside
                  <vol>/dirs)

                  8. Unmount the volume, stop it and flush kernel caches
                  of all servers (sync ; echo 3 >
                  /proc/sys/vm/drop_caches)

                  9. Start the volume, mount it, and execute "time ls -l
                  <vol>/dirs | wc -l"

                  10. Delete directory <vol>/dirs and recreate it
                  with 300 subdirectories also

                  11. Unmount the volume, stop it and flush kernel
                  caches of all servers (sync ; echo 3 >
                  /proc/sys/vm/drop_caches)

                  12. Start the volume, mount it, and execute "time ls
                  -l <vol>/dirs | wc -l"

                  
                  With this test, I get the following times:

                  
                  first ls: 1 second

                  second ls: 3.5 seconds

                  third ls: 10 seconds

                  
                  I don't understand the second ls because the
                  <vol>/dirs directory still have the same 300
                  subdirectories. But the third one is worst.

                  
                  I've tried with different kinds of volumes
                  (distributed-replicated, distributed, and even a
                  single brick), and the behavior is the same (though
                  the times are smaller when less bricks are involved).

                  
                  After reaching this situation, I've tried to get the
                  previous ls times by deleting directories, however the
                  times do not seem to improve. Only after doing some
                  "dirty" tests and removing empty gfid directories from
                  <vol>/.glusterfs on all bricks I get better
                  times, though not as good as the first ls (3 - 4
                  seconds better than the third ls).

                  
                  This is always reproducible if the volume is stopped
                  and the caches are emptied before each ls. With more
                  files and/or directories, it can take up to 20 or more
                  seconds to list a directory with 100-200
                  subdirectories.

                  
                  Without stopping anything, a second ls responds in
                  about 0.2 seconds.

                  
                  I've also tested this with ext4 and BTRFS (I know it
                  is not supported, but tested anyway). These are the
                  results:

                  
                  ext4 first ls: 0.5 seconds

                  ext4 second ls: 0.8 seconds

                  ext4 third ls: 7 seconds

                  
                  btrfs first ls: 0.5 seconds

                  btrfs second ls: 0.5 seconds

                  btrfs third ls: 0.5 seconds

                  
                  It seems clear that it depends on the file system, but
                  if I access directly the bricks, all ls take at most
                  0.1 seconds to complete.

                  
                  Repairing and defragmenting the bricks does not help.

                  
                  strace'ing the glusterfs process of the bricks, I see
                  that for each directory a lot of entries from
                  <vol>/.glusterfs are lstat'ed and a lot of
                  lgetxattr are called. For 300 directories I've counted
                  more than 4500 lstat's and more than 5300 lgetxattr,
                  many of them repeated. I've also noticed that some
                  lstat's take from 10 to 60 ms to complete (with XFS).

                  
                  Is there any way to minimize these effects ? I'm doing
                  something wrong ?

                  
                  Thanks in advance for your help,

                  
                  Xavi

                  
                  _______________________________________________

                  Gluster-devel mailing list

                  Gluster-devel@xxxxxxxxxx

                  https://lists.nongnu.org/mailman/listinfo/gluster-devel

                
                _______________________________________________

                Gluster-devel mailing list

                Gluster-devel@xxxxxxxxxx

                https://lists.nongnu.org/mailman/listinfo/gluster-devel

              
      _______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxx
https://lists.nongnu.org/mailman/listinfo/gluster-devel

    
Attachment:
strace-ls.txt.gz

Description: GNU Zip compressed data