On 5/13/2010 6:24 PM, Craig Carl wrote: > Jeff - > Thanks for your email, I think I've got a grasp of your > environment now and I understand the problem. If we create a > "/gluster/small_files" and a "/gluster/large_files" your users are > unlikely to respect distinction, plus it is a management nightmare, > right? > If you have time I'd like your help writing a feature request that > would implement what you need. Something like - > > Gluster should provide the option of distributing files based on size > to different volumes. > This distribution should be transparent to users. > This distribution only needs to happen the first time a file is written. > The Gluster administrator should have the ability to provide a file > size range for each volume. > The different volumes could be different types; mirror, stripe, mirror > & distribute, etc. > > What have I missed? > > Craig That would be one solution. I would target another that I suspecr is probably simpler: Gluster should provide the option of pseudo-randomizing the distribution of file stripes across volumes, so that all small files do not end up on the same subvolume of a cluster/stripe. This distribution should be transparent to users. This distribution only needs to happen the first time a file is written and may be based on the file name hash (a la cluster/distribute). The net behavior could be such that small files (less that the block-size) would have the same data distribution pattern as they would have with cluster/distribute, while larger files (greater than the stripe block-size) would have their upper blocks ditributed in a round-robin from that starting place. Given that the code already exists for distributing files based on namehash in cluster/distribute I think this could be an easier feature to add. Jeff