Just as a small warning, Sheepdog is well away from being production ready - the big problem that I can see is that it won't de-allocate blocks once an image is deleted! It definitely has promise for the future though. We set up a 4 node cluster using Gluster with KVM late last year - which has been running along quite nicely for us. To quote myself from a forum post I made a few weeks ago (note, prices are in Australian dollars): In short, we built a VM cluster without a traditional SAN based on free Linux-based software for a comparatively small amount of money. The company has had some reasonably explosive growth over the past 18 months, so the pressure has been on to deliver more power to the various business units without breaking the bank. We've been running VMware reasonably successfully, so the obvious option was to just do that again. We called our friendly Dell rep for a quote using the traditional SAN, some servers, dual switches etc - which came back at around $40,000 - that wasn't going to fly, so we needed to get creative. The requirements for this particular cluster were fairly modest, immediate need for a mix of ~20 moderate-use Windows and Linux VMs, ability to scale 3-4x in the short term, cheaper is better and full redundancy is a must. Gluster lets you create a virtual "SAN" across multiple nodes, either replicating everything to every node (RAID1), distributing each file to a separate node (JBOD / RAID0 I guess you could call it at a stretch) or a combination of the two (called distribute/replicate in Gluster terms). After running it on a few old boxes as a test, we decided to take the plunge and build up the new cluster using it. The tools of choice are KVM for the actual virtualisation and GlusterFS to run the storage all built on top of Debian. The hardware We're very much a Dell house, so this all lives on Dell R710 servers, each with * 6 core Xeon * 24GB RAM * 6x 300GB 15k SAS drives * Intel X520-DA2 dual port 10 GigE NIC (SFP+ twinax cables) * Usual DRAC module, ProSupport etc. All connected together using a pair of Dell 8024F 10 Gig eth switches (active / backup config). All up, the price was in the region of $20,000. Much better. Gluster - storage Gluster is still a bit rough around the edges, but it does do what we needed reasonably well. The main problem is that if a node has to resync (self-heal in Gluster terms), it locks the WHOLE file for reading across all nodes until the sync is finished. If you have a lot of big VM images, this can mean that the storage for them "disappears" as far as KVM is concerned while the sync happens, leading the VM to hard crash. Even with 10 Gig Eth and fast disks, moving several-hundred-GB images takes a while. Currently, if a node goes offline, we leave it dead until such time as we can shut down all of the VMs and bring it back up gracefully. Only happened once so far (hardware problem), but it is something that certainly is worrying. This is apparently being fixed in the next major release (3.3), due out mid this year. A point to note is that Gluster was recently acquired by Red Hat, so that is very positive. We did have a few stability problems with earlier versions, but 3.2.5 runs smoothly from what we've thrown at it so far. KVM - hypervisor Very powerful, performs better than VMware in our testing, and totally free. We use libvirt with some custom apps to manage everything, but 99% of it is set-and-forget. For those who want a nice GUI in Windows, sadly there is nothing that is 100% there yet. For those of us who prefer to use command lines, virsh from libvirt works very well. The only problem is that the concept of live-snapshots doesn't currently exist - the VM has to be paused or shut down to take a snapshot. We've tested wedging LVM under the storage bricks, which lets us snapshot the base storage and grab our backups from there. It seems to work OK on the test boxes, but it isn't in production yet. Until we can get a long enough window to set that up, we're doing ad-hoc manual snapshots of the VMs, usually before/after a major change when we have an outage window already. Day-to-day data is backed up the traditional way. What else did we look at? There are a number of other shared storage / cluster storage apps out there (Ceph, Lustre, DRBD etc), all of which have their own little problems. There are also a lot of different hypervisors out there (Xen being the main other player), but KVM fit our needs perfectly. We looked at doing 4x bonded gigabit ethernet for the storage - but Dell offered us a deal that wasn't much difference to step up to 10 Gig Eth, and it meant that we could avoid the fun of dealing with bonded links. Realistically, bonded gigabit would have done the job for us, but the deal we were offered made it silly to consider doing that. If memory is right, the 8024F units were brand new at the time, so I think they were trying to get some into the wild. Conclusion There are certainly a few rough edges (primarily to do with storage), but nothing show-stopping for what we needed at the moment. In the end, the business is happy, so we're happy. I'd definitely say this approach at least worth a look for anyone building a small VM cluster. Gluster is still a bit too immature for me to be 100% happy with it, but I think it's going to be perfect in 6 months time. At the moment, it is fine to use in production with a bit of care and knowledge of the limitations. -----Original Message----- From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Miles Fidelman Sent: Friday, 17 February 2012 5:47 AM Cc: gluster-users at gluster.org Subject: Re: question re. current state of art/practice Brian Candler wrote: > On Wed, Feb 15, 2012 at 08:22:18PM -0500, Miles Fidelman wrote: >> We've been running a 2-node, high-availability cluster - basically >> xen w/ pacemaker and DRBD for replicating disks. We recently >> purchased 2 additional, servers, and I'm thinking about combining >> all 4 machines into a 4-node cluster - which takes us out of DRBD >> space and requires some other kind of filesystem replication. >> >> Gluster, Ceph, Sheepdog, and XtreemFS seem to keep coming up as >> things that might work, but... Sheepdog is too tied to KVM > ... although if you're considering changing DRBD->Gluster, then > changing > Xen->KVM is perhaps worth considering too? Considering it, but... Sheepdog doesn't seem to have the support that Gluster does, and my older servers don't have the processor extensions necessary to run KVM. Sigh.... >> i. Is it now reasonable to consider running Gluster and Xen on the >> same boxes, without hitting too much of a performance penalty? > I have been testing Gluster on 24-disk nodes : > - 2 HBAs per node (one 16-port and one 8-port) > - single CPU chip (one node is dual-core i3, one is quad-core Xeon) > - 8GB RAM > - 10G ethernet > and however I hit it, the CPU is mostly idle. I think the issue for > you is more likely to be one of latency rather than throughput or CPU > utilisation, and if you have multiple VMs accessing the disk > concurrently then latency becomes less important. > > However, I should add that I'm not running VMs on top of this, just > doing filesystem tests (and mostly reads at this stage). > > For what gluster 3.3 will bring to the table, see this: > http://community.gluster.org/q/can-i-use-glusterfs-as-an-alternative-n > etwork-storage-backing-for-vm-hosting/ Thanks! That gives me some hard info. I'm starting to think waiting for 3.3 is a very good idea. Might start playing with the beta. Miles -- In theory, there is no difference between theory and practice. In practice, there is. .... Yogi Berra _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users -- Message protected by MailGuard: e-mail anti-virus, anti-spam and content filtering. http://www.mailguard.com.au