On Thu, Jul 19, 2012 at 2:14 AM, Jake Grimmett <jog at mrc-lmb.cam.ac.uk>wrote: > Dear Pranith /Anand , > > Update on our progress with using KVM & Gluster: > > We built a two server (Dell R710) cluster, each box has... > 5 x 500 GB SATA RAID5 array (software raid) > an Intel 10GB ethernet HBA. > One box has 8GB RAM, the other 48GB > both have 2 x E5520 Xeon > Centos 6.3 installed > Gluster 3.3 installed from the rpm files on the gluster site > > > 1) create a replicated gluster volume (on top of xfs) > 2) setup qemu/kvm with a gluster volume (mounts localhost:/gluster-vol) > 3) sanlock configured (this is evil!) > 4) build a virtual machines with 30GB qcow2 image, 1GB RAM > 5) clone this VM into 4 machines > 6) check that live migration works (OK) > > Start basic test cycle: > a) migrate all machines to host #1, then reboot host #2 > b) watch logs for self-heal to complete > c) migrate VM's to host #2, reboot host #1 > d) check logs for self heal > > The above cycle can be repeated numerous times, and completes without > error, provided that no (or little) load is on the VM. > > > If I give the VM's a work load, such by running "bonnie++" on each VM, > things start to break. > 1) it becomes almost impossible to log in to each VM > 2) the kernel on each VM starts giving timeout errors > i.e. "echo 0 > /proc/sys/kernel/hung_task_**timeout_secs" > 3) top / uptime on the hosts shows load average of up to 24 > 4) dd write speed (block size 1K) to gluster is around 3MB/s on the host > > > While I agree that running bonnie++ on four VM's is possibly unfair, there > are load spikes on quiet machines (yum updates etc). I suspect that the I/O > of one VM starts blocking that of another VM, and the pressure builds up > rapidly on gluster - which does not seem to cope well under pressure. > Possibly this is the access pattern / block size of qcow2 disks? > > I'm (slightly) disappointed. > > Though it doesn't corrupt data, the I/O performance is < 1% of my > hardwares capability. Hopefully work on buffering and other tuning will fix > this ? Or maybe the work mentioned getting qemu talking directly to gluster > will fix this? > > Do you mean that the I/O is bad when you are performing the migration? Or bad in general? If it is bad in general the qemu driver should help. Also try presenting each VM a FUSE mount point of its own (we have seen that help improve the overall system IOPs) If it is slow performance only during failover/failback, we probably need to do some more internal QoS tuning to de-prioritize self-heal traffic from preempting VM traffic for resources. Avati -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://gluster.org/pipermail/gluster-users/attachments/20120719/d1259067/attachment-0001.htm>