small files and cluster/stripe

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jeff - 
Two comments/ideas. 

1. If you are limited to four pieces of hardware, the minimum for stripe, and you want to stripe some of the data and just distribute other files there is a way to do that. Ideally you would use your hardware RAID controllers to create two LUNs on each host, one for distribute, the other for stripe. If you don't have hardware RAID you could use LVM2 or ZFS to achieve the same thing. (or you could use folders) 
1a. Once you have two file systems created use glusterfs-volgen to create the vol files for the distribute export just like you normally would. 
1b. Move the files you just created to the storage servers and clients. 
1c. Re-run glusterfs-volgen this time for the stripe, adding the -p option and specifying a port. (something above 1024, not 6996). 
1d. Move the files you just created to the storage servers and clients. 
1e . Start Gluster twice on all the servers, specifying the different vol files. 
1f. You now have two GlusterFS exports, one distribute, the other mirror. 


1g. You can mount one inside the other on the client if that makes management easier. 
There are advantages to this model, having two separate Gluster instances significantly improves parallelism on the storage servers. You can manage the two instances as if they are on different iron. 




2. The use case for stripe is vanishingly small. If you have very large files (at least 2X the amount of memory in your storage servers and a minimum of 50GB) with very limited writes and simultaneous access from hundreds of clients then maybe stripe might be appropriate. Stripe was designed for a specific type of HPC problem solving, not general file serving. Our video streaming users don't use stripe, even though that is an obvious use, there are better ways to configure Gluster for that. If you could share the type of content/access methods/iops per sec we could make some specific suggestions. 








Thanks, 





Craig 









-- 
Craig Carl 



Gluster, Inc. 
Cell - (408) 829-9953 (California, USA) 
Gtalk - craig.carl at gmail.com 

----- Original Message ----- 
From: "Jeff Anderson-Lee" <jonah at eecs.berkeley.edu> 
To: gluster-users at gluster.org 
Sent: Thursday, May 13, 2010 12:36:58 PM GMT -08:00 US/Canada Pacific 
Subject: small files and cluster/stripe 

cluster/stripe will split large files across multiple volumes, but it 
seems to 
always put the first part of the file on the first volume; if you have a 
bunch of small files they all end up there, and one volume gets heavily 
used by small files while the others are empty. 

cluster/distribute spreads files across multiple volumes, but it puts 
the whole file on a single volume. 

Some marriage of the two would be helpful for workloads which contain 
both large and small files, like adding an "option block-size ..." to 
cluster/distribute or "option distribute" to cluster/stripe; it would 
use the filename hash modulo nSubvolumes to determine which volume to 
start in for the first block, then rotate around the stripe for the rest. 

I suppose I can work-around by creating multiple volumes as 
sub-directories of the same partition, then striping across those in 
rotations, and distributing across the stripes. 

Is there some other way? Am I missing something? 

Jeff Anderson-Lee 

_______________________________________________ 
Gluster-users mailing list 
Gluster-users at gluster.org 
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users 


[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux