Re: GlusterFS as virtual machine storage

WK <wkmail@xxxxxxxxx> · Thu, 24 Aug 2017 13:20:16 -0700

On 8/23/2017 10:44 PM, Pavel Szalbot wrote:
Hi,

On Thu, Aug 24, 2017 at 2:13 AM, WK <wkmail@xxxxxxxxx> wrote:
The default timeout for most OS versions is 30 seconds and the Gluster
timeout is 42, so yes you can trigger an RO event.
I get read-only mount within approximately 2 seconds after failed IO.

Hmm, we don't see that, even on busy VMs.
We ARE using QCOW2 disk images though.

Also, though we no longer use Ovirt, I am still on the list. They are 
heavy Gluster users and they would be howling if they all had your 
experience.

Though it is easy enough to raise as Pavel mentioned

# echo 90 > /sys/block/sda/device/timeout
AFAIK this is applicable only for directly attached block devices
(non-virtualized).

No, if you use SATA/IDE emulation (NOT virtio) it is there WITHIN the VM.
We have a lot of legacy VMs from older projects/workloads that have that 
and we haven't bothered changing them because "they are working fine now"
It is NOT there on virtio.

Likewise virtio "disks" don't even have a timeout value that I am aware of
and I don't recall them being extremely sensitive to disk issues on either
Gluster, NFS or DAS.
We use only virtio and these problems are persistent - temporarily
suspending a node (e.g. HW or Gluster upgrade, reboot) is very scary,
because we often end up with read-only filesystems on all VMs.

However we use ext4, so I cannot comment on XFS.

We use the fuse mount, because we are lazy and haven't upgraded to 
libgfapi.  I hope to start a new cluster with to libfgapi shortly 
because of the better performance.
Also we use a localhost mount for the gluster driveset on each compute 
node (i.e. so called hyperconverged). So the only 'gluster' only kit is 
the lightweight arb box.
So those VMs in the gluster 'pool' have a local write and then only 1 
off-server write (to the other gluster enabled compute host), which 
means pretty good performance.

We use the gluster included 'virt' tuning set of:

performance.quick-read=off
performance.read-ahead=off
performance.io-cache=off
performance.stat-prefetch=off
performance.low-prio-threads=32
network.remote-dio=enable
cluster.eager-lock=enable
cluster.quorum-type=auto
cluster.server-quorum-type=server
cluster.data-self-heal-algorithm=full
cluster.locking-scheme=granular
cluster.shd-max-threads=8
cluster.shd-wait-qlength=10000
features.shard=on
user.cifs=off

We do play with shard size and have settled down on 64M, though I've 
seen recommendations of 128M and 512M for VMs.
We didn't really notice much of a difference with any of those as long 
as they were at least 64M

This discussion will probably end before I migrate VMs from Gluster to
local storage on our Openstack nodes, but I might run some tests
afterwards and keep you posted.

I would be interested in your results. You may also look into Ceph. It 
is more complicated than Gluster, (well, more complicated than our 
simple little Gluster arrangement) but the OpenStack people swear by it.
It wasn't suited to our needs, but it tested well, when we looked into 
it last year.

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users