glusterfs + cloudstack setup

driver at megahappy.net (Bryan Whitehead) · Fri, 17 May 2013 15:21:43 -0700

I think you are referring to this statement: "After adding a gluster
layer (fuse mount) write speeds per process are at ~150MB/sec."

So the raw fs write speed for 1 or more processes against a mountpoint
on xfs I get ~300MB/sec. After the fuse layer is added (with replica=2
as shown in my config) I'm getting 150MB/sec for 1 process (still not
inside a VM). If I add another process I get roughly another 150MB/sec
on that process. Adding a third process - each process is only pushing
100MB/sec (so 3 processes writing as fast as they can are collectively
getting 300MB/sec). In practice (outside the lab) I have more than 2
machines so it hard to go back and figure out exactly what the total
bandwidth or write speed ends up being because replica=2 with 6 bricks
and where a hash ends up hitting all make benchmarking difficult.

Inside a VM I typically can push 50-70MB/sec when the qcow2 file is
living on glusterfs mount. If I create a VM that lives on a local
mountpoint (non-gluster non-shared filesystem) I get 70-90MB/sec (this
is using the same hardware/raid as my gluster bricks). So using a
mountpoint where I can easily push 300MB/sec with dd sucks once qcow2
adds its layer. Interestingly, if I run dd=/dev/zero of=/tmp/blah and
let the disk fill up inside the VM, then delete the temp file the
speed on a gluster mountpoint vs local proportionally shifts for the
better. Basically once a qcow2 file is fully-grown (not expanding) I
get 70-110MB/sec on a gluster mount and 90-130MB/sec on locally
mounted qcow2 filesystem. I'm assuming the write speeds remain lame as
the qcow2 file probably grew in some really retarded way as ext4
allocated non-contiguous blocks inside the VM while the qcow2 file
grew in a contiguous manner. This means the qcow2 file is basically a
scrambled mess of pointers to small out of order blocks.

I can do the same 'per process' idea where I spin up 2, 3, 4, 5, 6,
... N VM's running on the glusterfs mountpoint (as apposed to simple
dd processes) and have them all go crazy writing. I'll end up maxing
out the underlying block devices. so roughly after 6 VM's are dd-ing
like crazy (inside the vm) I've maxed the IO out. 50MB/sec * 6 VM's =
300MB/sec (well 600MB/sec because replica=2). 2 machines in the test
cluster and they each have a brick that can do 300MB/sec. with
replica=2, 6 VM's going full speed have used 100% of my IO resources
available. So to be honest - I think that means everything is a-ok. It
would be nice if 1 VM could push a full 300MB/sec. But that isn't even
remotely happening even when using a native mount with qcow2. The nice
part of many processes seeking for data non-contiguously in a qcow2
file is it is happening N times so overall I still get pretty decent
speeds. Read performance seems to be pretty good (I say this
non-empirically since my disk related issues have been write speed
related - read benchmark blurb below).

The growing/resizing of a qcow2 files suck. They suck more when on top
of glusterfs (with replica=2 anyway). Growing a file across 2
filesystems on 2 different bricks on 2 different boxes is just going
to cost some kind of time somewhere. Either brick a or brick b is
going finish before the other and one will wait for the other.

Knowing that worst case I'm going to get 50-70MB/sec inside a VM I
focus on making sure scaling out horizontally is *really* working.

My biggest regret on all these benchmarks I did: I pretty much skipped
out on doing a full round of read benchmarks. Also, I just don't have
the same hardware in my lab that matches anymore to avoid having to
redo all my benchmarks (can't just do some read benchmarks since the
hardware used is now in production). :(

Once native gluster support in qemu hits the mainstream the qcow2 will
go away and things should be ... better.