Would there be a use for cluster-specific filesystem tools?

Michael Peek <peek@xxxxxxxxxxx> · Wed, 16 Apr 2014 09:50:48 -0400



    Hi guys,

    
    (I'm new to this, so pardon me if my shenanigans turns out to be a
    waste of your time.)

    
    I have been experimenting with Gluster by copying and deleting large
    numbers of files of all sizes.  What I found was that when deleting
    a large number of small files, the deletion process seems to take a
    good chunk of my time -- in some cases it seemed to take a
    significant percentage of the time that it took to copy the files to
    the cluster to begin with.  I'm guessing that the reason is a
    combination of find and rm -fr processing files serially and having
    to wait on the packets to travel back and forth over the network. 
    But with a clustering filesystem, the bottleneck is processing files
    serially and waiting for network packets when you don't have to.

    
    So I decided to try an experiment.  Instead of using /bin/rm to
    delete files serially, I wrote my own quick-and-dirty recursive rm
    (and recursive ls) that uses pthreads (listed as "cluster-rm" and
    "cluster-ls" in the table below):

    
    Methods:

    
    1) This was done on a Linux system.  I suspect that Linux (or any
    modern OS) caches filesystem information.  For example, after
    setting up a directory, when running rm -fr on that directory, the
    time for rm to complete is lessened if I first run find on the same
    directory.  So to avoid this caching effect, each command was run on
    it's own test directory.  (I.e. find was never run on the same
    directory as rm -fr or cluster-rm.)  This approach seemed to prevent
    inconsistencies resulting from any caching behavior, resulting in
    run times that were more consistent.

    
    2) Each test directory contained the exact same data for each of the
    four commands tested (find, cluster-ls, rm, cluster-rm) for each
    test run.

    
    3) All commands were run on a client machine and not one of the
    cluster nodes.

    
    Results:

    
          Data Size

          
          Command

          
          Test #1

          
          Test #2

          
          Test #3

          
          Test #4

          
          49GB

          
          find -print

          
          real    6m45.066s

            user    0m0.172s

            sys    0m0.748s

          
          real    6m18.524s

            user    0m0.140s

            sys    0m0.508s

          
          real    5m45.301s

            user    0m0.156s

            sys    0m0.484s

          
          real    5m58.577s

            user    0m0.132s

            sys    0m0.480s

          
          cluster-ls

          
          real    2m32.770s

            user    0m0.208s

            sys    0m1.876s

          
          real    2m21.376s

            user    0m0.164s

            sys    0m1.568s

          
          real    2m40.511s

            user    0m0.184s

            sys    0m1.488s

          
          real    2m36.202s

            user    0m0.172s

            sys    0m1.412s

          
          49GB

          
          rm -fr

          
          real    16m36.264s

            user    0m0.232s

            sys    0m1.724s

          
          real    16m16.795s

            user    0m0.248s

            sys    0m1.528s

          
          real    15m54.503s

            user    0m0.204s

            sys    0m1.396s

          
          real    16m10.037s

            user    0m0.168s

            sys    0m1.448s

          
          cluster-rm

          
          real    1m50.717s

            user    0m0.236s

            sys    0m1.820s

          
          real    1m44.803s

            user    0m0.192s

            sys    0m2.100s

          
          real    2m6.250s

            user    0m0.224s

            sys    0m2.200s

          
          real    2m6.367s

            user    0m0.224s

            sys    0m2.316s

          
          97GB

          
          find -print

          
          real    11m39.990s

            user    0m0.380s

            sys    0m1.428s

          
          real    11m21.018s

            user    0m0.380s

            sys    0m1.224s

          
          real    11m33.257s

            user    0m0.288s

            sys    0m0.924s

          
          real    11m4.867s

            user    0m0.332s

            sys    0m1.244s

          
          cluster-ls

          
          real    4m46.829s

            user    0m0.504s

            sys    0m3.228s

          
          real    5m15.538s

            user    0m0.408s

            sys    0m3.736s

          
          real    4m52.075s

            user    0m0.364s

            sys    0m3.004s

          
          real    4m43.134s

            user    0m0.452s

            sys    0m3.140s

          
          97GB

          
          rm -fr

          
          real    29m34.138s

            user    0m0.520s

            sys    0m3.908s

          
          real    28m11.000s

            user    0m0.556s

            sys    0m3.480s

          
          real    28m37.154s

            user    0m0.412s

            sys    0m2.756s

          
          real    28m41.724s

            user    0m0.380s

            sys    0m4.184s

          
          cluster-rm

          
          real    3m30.750s

            user    0m0.524s

            sys    0m4.932s

          
          real    4m20.195s

            user    0m0.456s

            sys    0m5.316s

          
          real    4m45.206s

            user    0m0.444s

            sys    0m4.584s

          
          real    4m26.894s

            user    0m0.436s

            sys    0m4.732s

          
          145GB

          
          find -print

          
          real    16m26.498s

            user    0m0.520s

            sys    0m2.244s

          
          real    16m53.047s

            user    0m0.596s

            sys    0m1.740s

          
          real    15m10.704s

            user    0m0.364s

            sys    0m1.748s

          
          real    15m53.943s

            user    0m0.456s

            sys    0m1.764s

          
          cluster-ls

          
          real    6m52.006s

            user    0m0.644s

            sys    0m5.664s

          
          real    7m7.361s

            user    0m0.804s

            sys    0m5.432s

          
          real    7m4.109s

            user    0m0.652s

            sys    0m4.800s

          
          real    6m37.229s

            user    0m0.656s

            sys    0m4.652s

          
          145GB

          
          rm -fr

          
          real    40m10.396s

            user    0m0.624s

            sys    0m5.492s

          
          real    42m17.851s

            user    0m0.844s

            sys    0m4.872s

          
          real    39m6.493s

            user    0m0.484s

            sys    0m4.868s

          
          real    39m52.047s

            user    0m0.496s

            sys    0m4.980s

          
          cluster-rm

          
          real    6m49.769s

            user    0m0.708s

            sys    0m6.440s

          
          real    8m34.644s

            user    0m0.852s

            sys    0m8.345s

          
          real    6m3.563s

            user    0m0.636s

            sys    0m5.844s

          
          real    6m31.808s

            user    0m0.664s

            sys    0m5.996s

          
          1.1TB

          
          find -print
          real    62m4.043s

            user    0m1.300s

            sys    0m5.448s

          
          real    61m11.584s

            user    0m1.204s

            sys    0m5.172s

          
          real    65m37.389s

            user    0m1.708s

            sys    0m4.276s

          
          real    63m51.822s

            user    0m3.096s

            sys    0m9.869s

          
          cluster-ls

          
          real    73m12.463s

            user    0m2.472s

            sys    0m19.289s

          
          real    68m37.846s

            user    0m2.080s

            sys    0m18.625s

          
          real    72m56.417s

            user    0m2.516s

            sys    0m18.601s

          
          real    69m3.575s

            user    0m4.316s

            sys    0m35.986s

          
          1.1TB

          
          rm -fr

          
          real    188m1.925s

            user    0m2.240s

            sys    0m21.705s

          
          real    190m21.850s

            user    0m2.372s

            sys    0m18.885s

          
          real    200m25.712s

            user    0m5.840s

            sys    0m46.363s

          
          real    196m12.686s

            user    0m4.916s

            sys    0m41.519s

          
          cluster-rm

          
          real    85m46.463s

            user    0m2.512s

            sys    0m30.478s

          
          real    90m29.055s

            user    0m2.600s

            sys    0m30.382s

          
          real    88m16.063s

            user    0m4.456s

            sys    0m51.667

          
          real    77m42.096s

            user    0m2.464s

            sys    0m31.638s

          
    Conclusions:

    
    1) Once I had a threaded version of rm, a threaded version of ls was
    easy to make, so I included it in the tests (listed above as
    cluster-ls).  Performance looked spiffy up until the 1.1TB range,
    when cluster-ls started taking more time than find.  Right now I
    can't explain why.  1.1TB takes a long time to set up and process
    (about a day for each set of four commands), it could be that
    regular nightly backups might be interfering with performance.  If
    that's the case, then it calls into question the usefulness of my
    threaded approach.  Also, naturally the output from cluster-ls is
    out of order, so grep and sed would most likely be used in
    conjunction with something like that, and I haven't yet time-tested
    'cluster-ls | some-other-command' against using plain old find by
    itself.

    
    2) Results from cluster-rm look pretty good to me across the board. 
    Again, performance seems to fall off in the 1.1TB tests, and the
    reasons are not clear to me at this time, but performance is still
    half that of rm -fr.  Run times fluctuate more than in the previous
    tests, but I suppose that's to be expected.  But since performance
    does drop, it makes me wonder how well this approach scales on
    larger sets of data.

    
    3) My threaded cluster-rm/ls commands are not clever.  While
    traversing directories, any subdirectories found would result in a
    new thread to process it, up until some hard-coded limit is reached
    (for the above results, 100 threads were used).  After the thread
    count limit is reached, directories are processed using plain old
    recursion until a thread exits, freeing up a thread to process
    another subdirectory.

    
    Further Research:

    
    A) I would like to test further with larger data sets.

    
    B) I would like to implement a smarter algorithm for determining how
    many threads to use to maximize performance.  Rather than a
    hard-coded maximum, a better approach might be to use some metric
    for measuring number of inodes processed per second, and use that to
    determine the effectiveness of adding more threads until a local
    maxima is reached.

    
    C) How do these numbers change if the commands are run on one of the
    cluster nodes instead of a client?

    
    I have some ideas of smarter things to try, but I am at best an
    inexperienced (if enthusiastic) dabbler in the programming arts.  A
    professional would likely do a much better job.

    
    But if this data looks at all interesting or useful, then maybe
    there would be a call for a handful of cluster-specific filesystem
    tools?

    
    Michael Peek

    
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users