On 10/22/2013 02:42 AM, Jos? A. Lausuch Sales wrote: > Hi, > > we are currently evaluating GlusterFS for a production environment. > Our focus is on the high-availability features of GlusterFS. However, > our tests have not worked out well. Hence I am seeking feedback from you. > > > In our planned production environment, Gluster should provide shared > storage for VM disk images. So, our very basic initial test setup is > as follows: > > > We are using two servers, each providing a single brick of a > replicated gluster volume (Gluster 3.4.1). A third server runs a > test-VM (Ubuntu 13.04 on QEMU 1.3.0 and libvirt 1.0.3) which uses a > disk image file stored on the gluster volume as block device > (/dev/vdb). For testing purposes, the root file system of this VM > (/dev/vda) is a disk image NOT stored on the gluster volume. > > > To test the high-availability features of gluster under load, we run > FIO inside the VM directly on the vdb block device (see configuration > below). Up to now, we tested reading only. The test procedure is as > follows: > > 1.We start FIO inside the VM and observe by means of "top" which of > the two servers receives the read requests (i.e., increased CPU load > of the glusterd process). Let's say that Server1 has the CPU load by > glusterfsd. > > 2.While FIO is running, we take down the network of this Server1 and > observe if the Server2 takes over. You're bringing server1 down by taking down the NIC (assuming from #5). This does take down the connection but it does so without closing the TCP connection. Though this does represent worst-case scenarios, see http://joejulian.name/blog/keeping-your-vms-from-going-read-only-when-encountering-a-ping-timeout-in-glusterfs/ > > 3.This "fail over" works (almost 100% of the times), we see the CPU > load from glusterfsd on Server2. As expected, Server1 does not have > any load because is "offline". > > 4.After a while we bring up the NIC on Server1 again. In this step we > realized that the expected behavior is that when bringing up this NIC, > this server should take over again (something like active-passive > behavior) but this happens only 5-10% of the times. The CPU load is > still on Server2. I'm not sure I would have that expectation. The second server will have taken over the open FD and the reads should come from there. The reads for a given fd come from the first-to-respond to the lookup(). > > 5.After some time, we bring down the NIC on Server2 expecting that > Server1 takes over. This second "fail over" crashes. The VM complains > about I/O errors which can only be resolved by restarting the VM and > sometimes even removing and creating the volume again. > > > After some test, we realized that if restarting the glusterd daemon > (/etc/init.d/glusterd restart) on Server1 after step 3 or before step > 4, the Server1 takes over automatically without bringing down Server2 > or anything like that. Check the logs for glusterd (/var/log/glusterfs/etc-glusterfs-glusterd.vol.log) for clues. Perhaps the /way/ you're taking down the NIC is exposing some bug. Perhaps instead of taking it down, use iptables or just killall glusterfsd. > > > We tested this using the normal FUSE mount and libgfapi. If using > FUSE, the local mount sometimes becomes unavailable (ls shows not more > files) if the failover fails. > > > We have a few fundamental questions in this regard: > > i) Is Gluster supposed to handle such a scenario or are we making > wrong assumptions? Because the only solution we found is to restart > the daemon when a network outage occurs, but this is not acceptable in > a real scenario with VMs running real applications. I host my (raw and qcow2) vm images on a gluster volume. Since my servers are not expected to hard-crash a lot, I take them down for maintenance (kernel updates and such) gracefully, killing the processes first. This closes the TCP connections and everything just keeps humming along. > > ii) What is the recommended configuration in terms of caching (QEMU: > cache=none/writethrough/writeback) and direct I/O (FIO and Gluster) to > maximize the reliability of the failover process? We varied the > parameters but could find a working configuration. Do these parameters > have an impact at all? To the best of my knowledge, none of those should affect reliability. > > > > > FIO test specification: > > [global] > direct=1 > ioengine=libaio > iodepth=4 > filename=/dev/vdb > runtime=300 > numjobs=1 > > [maxthroughput] > rw=read > bs=16k > > > > VM configuration: > > <domain type='kvm' id='6'> > <name>testvm</name> > <uuid>93877c03-605b-ed67-1ab2-2ba16b5fb6b5</uuid> > <memory unit='KiB'>2097152</memory> > <currentMemory unit='KiB'>2097152</currentMemory> > <vcpu placement='static'>1</vcpu> > <os> > <type arch='x86_64' machine='pc-1.1'>hvm</type> > <boot dev='hd'/> > </os> > <features> > <acpi/> > <apic/> > <pae/> > </features> > <clock offset='utc'/> > <on_poweroff>destroy</on_poweroff> > <on_reboot>restart</on_reboot> > <on_crash>restart</on_crash> > <devices> > <emulator>/usr/bin/kvm</emulator> > <disk type='block' device='disk'> > <driver name='qemu' type='raw' cache='writethrough'/> > <source dev='/mnt/local/io-perf.img'/> > <target dev='vda' bus='virtio'/> > <alias name='virtio-disk0'/> > <address type='pci' domain='0x0000' bus='0x00' slot='0x04' > function='0x0'/> > </disk> > <disk type='block' device='disk'> > <driver name='qemu' type='raw' cache='writethrough'/> > <source dev='/mnt/shared/io-perf-testdisk.img'/> > <target dev='vdb' bus='virtio'/> > <alias name='virtio-disk1'/> > <address type='pci' domain='0x0000' bus='0x00' slot='0x07' > function='0x0'/> > </disk> > <controller type='usb' index='0'> > <alias name='usb0'/> > <address type='pci' domain='0x0000' bus='0x00' slot='0x01' > function='0x2'/> > </controller> > <interface type='network'> > <mac address='52:54:00:36:5f:dd'/> > <source network='default'/> > <target dev='vnet0'/> > <model type='virtio'/> > <alias name='net0'/> > <address type='pci' domain='0x0000' bus='0x00' slot='0x03' > function='0x0'/> > </interface> > <input type='mouse' bus='ps2'/> > <graphics type='vnc' port='5900' autoport='yes' listen='127.0.0.1'> > <listen type='address' address='127.0.0.1'/> > </graphics> > <video> > <model type='cirrus' vram='9216' heads='1'/> > <alias name='video0'/> > <address type='pci' domain='0x0000' bus='0x00' slot='0x02' > function='0x0'/> > </video> > <memballoon model='virtio'> > <alias name='balloon0'/> > <address type='pci' domain='0x0000' bus='0x00' slot='0x05' > function='0x0'/> > </memballoon> > </devices> > <seclabel type='none'/> > </domain> > > > > > Thank you very much in advance, > Jose Lausuch > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131022/1b8b7c4f/attachment.html>