small files and cluster/stripe

jonah at eecs.berkeley.edu (Jeff Anderson-Lee) · Fri, 14 May 2010 16:37:28 -0700



On 5/14/2010 4:20 PM, Craig Carl wrote:
> Jeff -
>    I've paraphrased Tejas's response here -
>        1. There is no way to know how big a file will be until the 
> fclose() is received.
>        2. What would we do about files that change sizes across the 
> cutoff line?
>        3. We could perhaps add a size parameter to the 
> rebalance/defrag scripts we have.
>
> Would a process that redistributed the file on some sort of a schedule 
> work?
All these reasons are ones that would lead me *not* to try a 
big-file/small-file distribution scheme.  Combining a distributed 
(hash-based) offset with file striping makes much more sense to me.  It 
doesn't work well for hard links or simple rename, but it makes the rest 
simpler.

Jeff
> Craig
>
> --
> Craig Carl
> Gluster, Inc.
> Cell - (408) 829-9953 (California, USA)
> Gtalk - craig.carl at gmail.com
>
>
> ----- Original Message -----
> From: "Jeff Anderson-Lee" <jonah at eecs.berkeley.edu>
> To: "Craig Carl" <craig at gluster.com>
> Cc: gluster-users at gluster.org
> Sent: Thursday, May 13, 2010 6:39:31 PM GMT -08:00 US/Canada Pacific
> Subject: Re: small files and cluster/stripe
>
> On 5/13/2010 6:24 PM, Craig Carl wrote:
>
>     Jeff -
>         Thanks for your email, I think I've got a grasp of your
>     environment now and I understand the problem. If we create a
>     "/gluster/small_files" and a "/gluster/large_files" your users are
>     unlikely to respect distinction, plus it is a management
>     nightmare, right?
>     If you have time I'd like your help writing a feature request that
>     would implement what you need.  Something like -
>
>     Gluster should provide the option of distributing files based on
>     size to different volumes.
>     This distribution should be transparent to users.
>     This distribution only needs to happen the first time a file is
>     written.
>     The Gluster administrator should have the ability to provide a
>     file size range for each volume.
>     The different volumes could be different types; mirror, stripe,
>     mirror & distribute, etc.
>
>     What have I missed?
>
>     Craig
>
>
> That would be one solution.  I would target another that I suspecr is 
> probably simpler:
>
> Gluster should provide the option of pseudo-randomizing the 
> distribution of file stripes across volumes, so that all small files 
> do not end up on the same subvolume of a cluster/stripe.
> This distribution should be transparent to users.
> This distribution only needs to happen the first time a file is 
> written and may be based on the file name hash (a la cluster/distribute).
>
> The net behavior could be such that small files (less that the 
> block-size) would have the same data distribution pattern as they 
> would have with cluster/distribute, while larger files (greater than 
> the stripe block-size) would have their upper blocks ditributed in a 
> round-robin from that starting place.
>
> Given that the code already exists for distributing files based on 
> namehash in cluster/distribute I think this could be an easier feature 
> to add.
>
> Jeff
>