I think you are referring to this statement: "After adding a gluster layer (fuse mount) write speeds per process are at ~150MB/sec." So the raw fs write speed for 1 or more processes against a mountpoint on xfs I get ~300MB/sec. After the fuse layer is added (with replica=2 as shown in my config) I'm getting 150MB/sec for 1 process (still not inside a VM). If I add another process I get roughly another 150MB/sec on that process. Adding a third process - each process is only pushing 100MB/sec (so 3 processes writing as fast as they can are collectively getting 300MB/sec). In practice (outside the lab) I have more than 2 machines so it hard to go back and figure out exactly what the total bandwidth or write speed ends up being because replica=2 with 6 bricks and where a hash ends up hitting all make benchmarking difficult. Inside a VM I typically can push 50-70MB/sec when the qcow2 file is living on glusterfs mount. If I create a VM that lives on a local mountpoint (non-gluster non-shared filesystem) I get 70-90MB/sec (this is using the same hardware/raid as my gluster bricks). So using a mountpoint where I can easily push 300MB/sec with dd sucks once qcow2 adds its layer. Interestingly, if I run dd=/dev/zero of=/tmp/blah and let the disk fill up inside the VM, then delete the temp file the speed on a gluster mountpoint vs local proportionally shifts for the better. Basically once a qcow2 file is fully-grown (not expanding) I get 70-110MB/sec on a gluster mount and 90-130MB/sec on locally mounted qcow2 filesystem. I'm assuming the write speeds remain lame as the qcow2 file probably grew in some really retarded way as ext4 allocated non-contiguous blocks inside the VM while the qcow2 file grew in a contiguous manner. This means the qcow2 file is basically a scrambled mess of pointers to small out of order blocks. I can do the same 'per process' idea where I spin up 2, 3, 4, 5, 6, ... N VM's running on the glusterfs mountpoint (as apposed to simple dd processes) and have them all go crazy writing. I'll end up maxing out the underlying block devices. so roughly after 6 VM's are dd-ing like crazy (inside the vm) I've maxed the IO out. 50MB/sec * 6 VM's = 300MB/sec (well 600MB/sec because replica=2). 2 machines in the test cluster and they each have a brick that can do 300MB/sec. with replica=2, 6 VM's going full speed have used 100% of my IO resources available. So to be honest - I think that means everything is a-ok. It would be nice if 1 VM could push a full 300MB/sec. But that isn't even remotely happening even when using a native mount with qcow2. The nice part of many processes seeking for data non-contiguously in a qcow2 file is it is happening N times so overall I still get pretty decent speeds. Read performance seems to be pretty good (I say this non-empirically since my disk related issues have been write speed related - read benchmark blurb below). The growing/resizing of a qcow2 files suck. They suck more when on top of glusterfs (with replica=2 anyway). Growing a file across 2 filesystems on 2 different bricks on 2 different boxes is just going to cost some kind of time somewhere. Either brick a or brick b is going finish before the other and one will wait for the other. Knowing that worst case I'm going to get 50-70MB/sec inside a VM I focus on making sure scaling out horizontally is *really* working. My biggest regret on all these benchmarks I did: I pretty much skipped out on doing a full round of read benchmarks. Also, I just don't have the same hardware in my lab that matches anymore to avoid having to redo all my benchmarks (can't just do some read benchmarks since the hardware used is now in production). :( Once native gluster support in qemu hits the mainstream the qcow2 will go away and things should be ... better.