Best practice for using Gluster as virtual machine storage

matthewa at ihostsolutions.com.au (Matthew Anderson) · Wed, 23 Mar 2011 04:44:59 +0000

Hi All,

Just writing to pick everyone's brains in designing HA storage for virtual machines using Gluster. What I'm hoping I'm able to achieve is a single namespace where IOPS can be increased linearly by adding servers whilst still being able to survive a single storage node going down. Basically I want to be able to do a distributed Raid10 where I can increase the stripe as I increase the number of storage servers to increase the IOPS.

What I've come up with so far is -
All Gluster communications done over 40Gbit Infiniband using the RDMA transport (All of our servers are currently Infiniband enabled already)
Virtual hosts running KVM and storing VM images on a Gluster namespace using the native client
Storage nodes contain 24x 7200rpm sata disks with SSD Read and Write Caches
	- Disks configured as 2x Raid 6 arrays
	- SSD cache to make up for the RAID 6 arrays lack of write speed (highly experimental on linux, I haven't tested this yet so Raid10 may be a better option)

Initially I am starting with two storage servers but I'd like to retain the ability to scale out one at a time so each server would need to contain two bricks to maintain replicas. The config would look something like -
Replicate sets
   Server1:/brick1
   Server2:/brick1
And
  Server1:/brick2
  Server2:/brick2

Then stripe the replicas sets to improve read and write performance. The namespace should have the read speed of 4x striped Raid6 arrays and the write speed of 2x striped Raid6 arrays. 

The above seems to work in theory until you add another server. To do so you would need to break the replication to re-replicate to a different brick and also increase the stripe size to 3. Is this possible to do on the fly or at all?
The config would look like-
Replicate sets
   Server1:/brick1
   Server2:/brick1
And 
  Server2:/brick2
  Server3:/brick2
And 
  Server3:/brick1
  Server1:/brick2

If I have to add servers in multiples of two it probably won't be the end of the world. It just means that IOPS won't scale for a single VM (given that the stripe is always set at 2) but will scale for many VM's being spread over the cluster. 

Can anyone see any problems or improvements that could be made to the design? 

Thanks
-Matt