On 07/07/2010 02:35 PM, sandeep dude wrote: > I am Sandeep from an Animation and Visual effects studio called as > GoldenEye. We have a very unstructured data which needs some data > replication but in reality the requirement is scaling up and at the > same time replication is somewhat tougher so we are using the " > Allway sync" software to sync the folders from time to time worrying > about the data server failure.. now when I saw Gluster I just got an > idea about the Isilon storage. I have few doubts which will make me > to use gluster at my studio. > > 1. If I got a storage cluster with 6 storage servers with mirror > option enabled and now what is the performance? 6x? or 2x? Probably something a bit less than 3x for write and close to 6x for read. Unless you're very paranoid about data loss, you'll probably want simple two-way replication. Each replica pair will offer a bit less than 1x performance for write because of replication overhead, and close to 2x for read because reads can be split across the two nodes. Then you take those three replica pairs and distribute across them to get 3x/6x. Of course, the natural question is: 2x/3x/6x of what? You can't just multiply the local-disk performance, because there is communication overhead to consider. Others can probably provide more concrete performance data for this type of configuration. For planning purposes, you should generally count on no more than half of the "obvious" numbers. > 2. and what if a node fail? does the data resides on the other one? If a node fails, its replication partner should still have the data, so you're protected against single failure. > 3. using stripe on 6 servers means usable space is just 1 server > space? and performance is 6x ? No, stripe doesn't store data redundantly so you'll still have the entire 6x space. > I have too much of data which will be sucking by the renderfarm and > my desktop users always complain that the servers are slower but > still we have Hitachi 2TB deskstar 7200rpm which sends 130MB/sec per > disk but now I have such a requirement where I want 400MB/sec for one > editing machine and 600MB/sec for the renderfarm and another > 400MB/sec for the deskstop users all simultaneously and under the > same name space...huh... 1.4GB/s is a pretty tall order. I've done 30x that, but that was on very large and specialized systems. Are all of those needs truly concurrent, or is it more like 400MB/s then 600MB/s then 400MB/s again for three separate work phases? Also, are those peak or sustained numbers, and for what kind of workload in terms of thread counts and I/O sizes? > I believe gluster can help me in doing this... > > I have setup gluster with onboard Gigabit Lan for Asus M2n68am > motherboard but its unable to connect... I was very very impressed > with gluster technology but unable to test it... > > But I have few questions... > > what to do in order to get that speed on my servers? > > 1. I want to use 4 machines with 4 1Gbps Intel dual port two cards > per system and 4 Hitachi 130MB/sec hard disk 2TB each drive ( 8TB in > total per system ) You can't practically get more than 100MB/s per NIC, even with good equipment and lots of tuning. If you want 1.4GB/s you'll need at least 14 NICs and therefore 7 machines . . . *at least*. With the configuration you describe, the four disks per machine will far outrun the two NICs, and cramming more NICs per machine might not help due to PCIe contention or network-stack limitations. > my question is do i get 16Gbps of throughput for one file ( atleast > according to the calculation ) do I need to use stripe? If you want very high throughput for large I/O requests (at least 128KB * stripe width) then stripe might help. In other situations it can actually hurt.