Thomas, That's exactly the kind of info I was looking for. Thanks! Think I'll go experiment a bit with the Gluster 3.3beta, and see what kind of results I can get. Miles Thomas Jackson wrote: > Just as a small warning, Sheepdog is well away from being production ready - > the big problem that I can see is that it won't de-allocate blocks once an > image is deleted! It definitely has promise for the future though. > > We set up a 4 node cluster using Gluster with KVM late last year - which has > been running along quite nicely for us. > > To quote myself from a forum post I made a few weeks ago (note, prices are > in Australian dollars): > > In short, we built a VM cluster without a traditional SAN based on free > Linux-based software for a comparatively small amount of money. The company > has had some reasonably explosive growth over the past 18 months, so the > pressure has been on to deliver more power to the various business units > without breaking the bank. > > We've been running VMware reasonably successfully, so the obvious option was > to just do that again. We called our friendly Dell rep for a quote using the > traditional SAN, some servers, dual switches etc - which came back at around > $40,000 - that wasn't going to fly, so we needed to get creative. > > The requirements for this particular cluster were fairly modest, immediate > need for a mix of ~20 moderate-use Windows and Linux VMs, ability to scale > 3-4x in the short term, cheaper is better and full redundancy is a must. > > Gluster lets you create a virtual "SAN" across multiple nodes, either > replicating everything to every node (RAID1), distributing each file to a > separate node (JBOD / RAID0 I guess you could call it at a stretch) or a > combination of the two (called distribute/replicate in Gluster terms). After > running it on a few old boxes as a test, we decided to take the plunge and > build up the new cluster using it. > > The tools of choice are KVM for the actual virtualisation and GlusterFS to > run the storage all built on top of Debian. > > The hardware > We're very much a Dell house, so this all lives on Dell R710 servers, each > with > * 6 core Xeon > * 24GB RAM > * 6x 300GB 15k SAS drives > * Intel X520-DA2 dual port 10 GigE NIC (SFP+ twinax cables) > * Usual DRAC module, ProSupport etc. > > All connected together using a pair of Dell 8024F 10 Gig eth switches > (active / backup config). All up, the price was in the region of $20,000. > Much better. > > Gluster - storage > Gluster is still a bit rough around the edges, but it does do what we needed > reasonably well. The main problem is that if a node has to resync (self-heal > in Gluster terms), it locks the WHOLE file for reading across all nodes > until the sync is finished. If you have a lot of big VM images, this can > mean that the storage for them "disappears" as far as KVM is concerned while > the sync happens, leading the VM to hard crash. Even with 10 Gig Eth and > fast disks, moving several-hundred-GB images takes a while. Currently, if a > node goes offline, we leave it dead until such time as we can shut down all > of the VMs and bring it back up gracefully. Only happened once so far > (hardware problem), but it is something that certainly is worrying. > > This is apparently being fixed in the next major release (3.3), due out mid > this year. A point to note is that Gluster was recently acquired by Red Hat, > so that is very positive. > > We did have a few stability problems with earlier versions, but 3.2.5 runs > smoothly from what we've thrown at it so far. > > KVM - hypervisor > Very powerful, performs better than VMware in our testing, and totally free. > We use libvirt with some custom apps to manage everything, but 99% of it is > set-and-forget. For those who want a nice GUI in Windows, sadly there is > nothing that is 100% there yet. For those of us who prefer to use command > lines, virsh from libvirt works very well. > > The only problem is that the concept of live-snapshots doesn't currently > exist - the VM has to be paused or shut down to take a snapshot. We've > tested wedging LVM under the storage bricks, which lets us snapshot the base > storage and grab our backups from there. It seems to work OK on the test > boxes, but it isn't in production yet. Until we can get a long enough window > to set that up, we're doing ad-hoc manual snapshots of the VMs, usually > before/after a major change when we have an outage window already. > Day-to-day data is backed up the traditional way. > > What else did we look at? > There are a number of other shared storage / cluster storage apps out there > (Ceph, Lustre, DRBD etc), all of which have their own little problems. > > There are also a lot of different hypervisors out there (Xen being the main > other player), but KVM fit our needs perfectly. > > We looked at doing 4x bonded gigabit ethernet for the storage - but Dell > offered us a deal that wasn't much difference to step up to 10 Gig Eth, and > it meant that we could avoid the fun of dealing with bonded links. > Realistically, bonded gigabit would have done the job for us, but the deal > we were offered made it silly to consider doing that. If memory is right, > the 8024F units were brand new at the time, so I think they were trying to > get some into the wild. > > Conclusion > There are certainly a few rough edges (primarily to do with storage), but > nothing show-stopping for what we needed at the moment. In the end, the > business is happy, so we're happy. > > I'd definitely say this approach at least worth a look for anyone building a > small VM cluster. Gluster is still a bit too immature for me to be 100% > happy with it, but I think it's going to be perfect in 6 months time. At the > moment, it is fine to use in production with a bit of care and knowledge of > the limitations. > > > -----Original Message----- > From: gluster-users-bounces at gluster.org > [mailto:gluster-users-bounces at gluster.org] On Behalf Of Miles Fidelman > Sent: Friday, 17 February 2012 5:47 AM > Cc: gluster-users at gluster.org > Subject: Re: question re. current state of art/practice > > Brian Candler wrote: >> On Wed, Feb 15, 2012 at 08:22:18PM -0500, Miles Fidelman wrote: >>> We've been running a 2-node, high-availability cluster - basically >>> xen w/ pacemaker and DRBD for replicating disks. We recently >>> purchased 2 additional, servers, and I'm thinking about combining >>> all 4 machines into a 4-node cluster - which takes us out of DRBD >>> space and requires some other kind of filesystem replication. >>> >>> Gluster, Ceph, Sheepdog, and XtreemFS seem to keep coming up as >>> things that might work, but... Sheepdog is too tied to KVM >> ... although if you're considering changing DRBD->Gluster, then >> changing >> Xen->KVM is perhaps worth considering too? > Considering it, but... Sheepdog doesn't seem to have the support that > Gluster does, and my older servers don't have the processor extensions > necessary to run KVM. Sigh.... > >>> i. Is it now reasonable to consider running Gluster and Xen on the >>> same boxes, without hitting too much of a performance penalty? >> I have been testing Gluster on 24-disk nodes : >> - 2 HBAs per node (one 16-port and one 8-port) >> - single CPU chip (one node is dual-core i3, one is quad-core Xeon) >> - 8GB RAM >> - 10G ethernet >> and however I hit it, the CPU is mostly idle. I think the issue for >> you is more likely to be one of latency rather than throughput or CPU >> utilisation, and if you have multiple VMs accessing the disk >> concurrently then latency becomes less important. >> >> However, I should add that I'm not running VMs on top of this, just >> doing filesystem tests (and mostly reads at this stage). >> >> For what gluster 3.3 will bring to the table, see this: >> http://community.gluster.org/q/can-i-use-glusterfs-as-an-alternative-n >> etwork-storage-backing-for-vm-hosting/ > Thanks! That gives me some hard info. I'm starting to think waiting for > 3.3 is a very good idea. Might start playing with the beta. > > Miles > > -- > In theory, there is no difference between theory and practice. > In practice, there is. .... Yogi Berra > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users -- In theory, there is no difference between theory and practice. In practice, there is. .... Yogi Berra