GlusterFS HA testing feedback

jlausuch at gmail.com (José A. Lausuch Sales) · Tue, 22 Oct 2013 21:17:42 +0200

Actually, 10Gb NICs. Shouldn't it be enough?

On Tue, Oct 22, 2013 at 8:04 PM, Bryan Whitehead <driver at megahappy.net>wrote:

> So gluster is just running on 10Mbit nic cards or 56Gbit Infiniband?
>
>
> With 1G nic cards, assuming only replica=2, you are looking at pretty
> limited IO for gluster to work with. That can cause long pauses and other
> timeouts in my experience.
>
>
> On Tue, Oct 22, 2013 at 2:42 AM, Jos? A. Lausuch Sales <jlausuch at gmail.com
> > wrote:
>
>> Hi,
>>
>> we are currently evaluating GlusterFS for a production environment. Our
>> focus is on the high-availability features of GlusterFS. However, our tests
>> have not worked out well. Hence I am seeking feedback from you.
>>
>>
>> In our planned production environment, Gluster should provide shared
>> storage for VM disk images. So, our very basic initial test setup is as
>> follows:
>>
>>
>>  We are using two servers, each providing a single brick of a replicated
>> gluster volume (Gluster 3.4.1). A third server runs a test-VM (Ubuntu 13.04
>> on QEMU 1.3.0 and libvirt 1.0.3) which uses a disk image file stored on the
>> gluster volume as block device (/dev/vdb). For testing purposes, the root
>> file system of this VM (/dev/vda) is a disk image NOT stored on the gluster
>> volume.
>>
>>
>> To test the high-availability features of gluster under load, we run FIO
>> inside the VM directly on the vdb block device (see configuration below).
>> Up to now, we tested reading only. The test procedure is as follows:
>>
>> 1. We start FIO inside the VM and observe by means of "top" which of the
>> two servers receives the read requests (i.e., increased CPU load of the
>> glusterd process). Let?s say that Server1 has the CPU load by glusterfsd.
>>
>> 2. While FIO is running, we take down the network of this Server1 and
>> observe if the Server2 takes over.
>>
>> 3. This ?fail over? works (almost 100% of the times), we see the CPU
>> load from glusterfsd on Server2. As expected, Server1 does not have any
>> load because is ?offline?.
>>
>> 4. After a while we bring up the NIC on Server1 again. In this step we
>> realized that the expected behavior is that when bringing up this NIC, this
>> server should take over again (something like active-passive behavior) but
>> this happens only 5-10% of the times.  The CPU load is still on Server2.
>>
>> 5. After some time, we bring down the NIC on Server2 expecting that
>> Server1 takes over.  This second "fail over" crashes. The VM complains
>> about I/O errors which can only be resolved by restarting the VM and
>> sometimes even removing and creating the volume again.
>>
>>
>> After some test, we realized that if restarting the glusterd daemon
>> (/etc/init.d/glusterd restart) on Server1 after step 3 or before step 4,
>> the Server1 takes over automatically without bringing down Server2 or
>> anything like that.
>>
>>
>> We tested this using the normal FUSE mount and libgfapi. If using FUSE,
>> the local mount sometimes becomes unavailable (ls shows not more files) if
>> the failover fails.
>>
>>
>> We have a few fundamental questions in this regard:
>>
>> i) Is Gluster supposed to handle such a scenario or are we making wrong
>> assumptions? Because the only solution we found is to restart the daemon
>> when a network outage occurs, but this is not acceptable in a real scenario
>> with VMs running real applications.
>>
>> ii) What is the recommended configuration in terms of caching (QEMU:
>> cache=none/writethrough/writeback) and direct I/O (FIO and Gluster) to
>> maximize the reliability of the failover process? We varied the parameters
>> but could find a working configuration. Do these parameters have an impact
>> at all?
>>
>>
>>
>>
>> FIO test specification:
>>
>> [global]
>> direct=1
>> ioengine=libaio
>> iodepth=4
>> filename=/dev/vdb
>> runtime=300
>> numjobs=1
>>
>> [maxthroughput]
>> rw=read
>> bs=16k
>>
>>
>>
>> VM configuration:
>>
>> <domain type='kvm' id='6'>
>>   <name>testvm</name>
>>   <uuid>93877c03-605b-ed67-1ab2-2ba16b5fb6b5</uuid>
>>   <memory unit='KiB'>2097152</memory>
>>   <currentMemory unit='KiB'>2097152</currentMemory>
>>   <vcpu placement='static'>1</vcpu>
>>   <os>
>>     <type arch='x86_64' machine='pc-1.1'>hvm</type>
>>     <boot dev='hd'/>
>>   </os>
>>   <features>
>>     <acpi/>
>>     <apic/>
>>     <pae/>
>>   </features>
>>   <clock offset='utc'/>
>>   <on_poweroff>destroy</on_poweroff>
>>   <on_reboot>restart</on_reboot>
>>   <on_crash>restart</on_crash>
>>   <devices>
>>     <emulator>/usr/bin/kvm</emulator>
>>     <disk type='block' device='disk'>
>>       <driver name='qemu' type='raw' cache='writethrough'/>
>>       <source dev='/mnt/local/io-perf.img'/>
>>       <target dev='vda' bus='virtio'/>
>>       <alias name='virtio-disk0'/>
>>       <address type='pci' domain='0x0000' bus='0x00' slot='0x04'
>> function='0x0'/>
>>     </disk>
>>     <disk type='block' device='disk'>
>>       <driver name='qemu' type='raw' cache='writethrough'/>
>>       <source dev='/mnt/shared/io-perf-testdisk.img'/>
>>       <target dev='vdb' bus='virtio'/>
>>       <alias name='virtio-disk1'/>
>>       <address type='pci' domain='0x0000' bus='0x00' slot='0x07'
>> function='0x0'/>
>>     </disk>
>>     <controller type='usb' index='0'>
>>       <alias name='usb0'/>
>>       <address type='pci' domain='0x0000' bus='0x00' slot='0x01'
>> function='0x2'/>
>>     </controller>
>>     <interface type='network'>
>>       <mac address='52:54:00:36:5f:dd'/>
>>       <source network='default'/>
>>       <target dev='vnet0'/>
>>       <model type='virtio'/>
>>       <alias name='net0'/>
>>       <address type='pci' domain='0x0000' bus='0x00' slot='0x03'
>> function='0x0'/>
>>     </interface>
>>     <input type='mouse' bus='ps2'/>
>>     <graphics type='vnc' port='5900' autoport='yes' listen='127.0.0.1'>
>>       <listen type='address' address='127.0.0.1'/>
>>     </graphics>
>>     <video>
>>       <model type='cirrus' vram='9216' heads='1'/>
>>       <alias name='video0'/>
>>       <address type='pci' domain='0x0000' bus='0x00' slot='0x02'
>> function='0x0'/>
>>     </video>
>>      <memballoon model='virtio'>
>>       <alias name='balloon0'/>
>>       <address type='pci' domain='0x0000' bus='0x00' slot='0x05'
>> function='0x0'/>
>>     </memballoon>
>>   </devices>
>>   <seclabel type='none'/>
>> </domain>
>>
>>
>>
>>
>> Thank you very much in advance,
>> Jose Lausuch
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131022/05d40c2c/attachment.html>