gluster-users-bounces at gluster.org wrote on 01/24/2012 10:10:15 AM: > > On Tue, Jan 24, 2012 at 09:11:01AM -0600, Greg_Swift at aotx.uscourts.gov wrote: > > We have to have large numbers of volumes (~200). Quick run down to give > > context. > > > > Our nodes would have around 128TB of local storage from several 32TB raid > > sets. We started with ext4, so had a 16TB maximum. > > Aside: http://blog.ronnyegner-consulting.de/2011/08/18/ext4-and- > the-16-tb-limit-now-solved/ kewl > > So we broke it down > > into nice even chunks of 16TB, thus 8 file systems. Our first attempt was > > ~200 volumes all using the 8 bricks per node (thus 1600 process/ports) > ... > > We had issues, and Gluster recommended > > reducing our process/port count. > > So just checking I understand, the original configuration was: > > /data1/vol1 .. /data1/vol200 > ... > /data8/vol1 .. /data8/vol200 correct > > Terminology issue: isn't each serverN:/dirM considered a separate 'brick' to > Gluster? I would have thought that configuration would count as 1600 bricks > per node (but with groups of 200 bricks sharing 1 underlying filesystem) You are right... i should have said 1600 bricks/process/ports For the sake of the conversation I think this expansion to the brick definition is Brick: A unique file system on a single storage node that runs a single glusterfsd process and opens a single listening TCP port. (1 brick = 1 process = 1 port). A combination of 1 or more bricks compromise a volume. > > > First we dropped down to only using 1 brick per volume per node, but this > > left us in a scenario of managing growth > > Like this? > > /data1/vol1 .. /data1/vol25 > /data2/vol26 .. /data2/vol50 > ... > /data8/vol175 .. /data8/vol200 yes > I see, so you have to assign the right subset of volumes to each filesystem. > I guess you could shuffle them around using replace-brick, but it would be a > pain. very much so > > So we determined to move to XFS to reduce from 8 partitions > > down to 2 LVs. Each would be 64TB each > > /data1/vol1 .. /data1/vol200 > /data2/vol1 .. /data2/vol200 > > i.e. 400 ports/processes/(bricks?) per server. That was the plan > > We then ran into some performance > > issues and found we had not tuned the XFS enough, which also deterred us > > from pushing forward with the move. > > I don't have any experience with XFS, but the Gluster docs do recommend it > as the one most heavily tested. > > I saw an old note here about tuning XFS to include extended attributes in > the inode: > http://www.gluster.org/community/documentation/index.php/ > Guide_to_Optimizing_GlusterFS > (although the values shown seem to be defaults to mkfs.xfs nowadays) > > Did you find any other tuning was required? We didn't do much tuning at the front end, which seems to have potentially been a problem. After the fact we did add some tuning such as disabling barriers, atime and diratime. > This is all extremely helpful - many thanks for sharing your experiences. > > BTW I am just in the process of setting up two test systems here. Somewhat > smaller than yours, but they are based on this chassis: > http://www.xcase.co.uk/24-bay-Hotswap-rackmount-chassis-norco- > RPC-4224-p/case-xcase-rm424.htm > with Hitachi low-power 3TB drives. > thats pretty kewl.. to bad it doesn't do SFF hard drives... that would be awesome. -greg