Re: GlusterFS as virtual machine storage

Pavel Szalbot <pavel.szalbot@xxxxxxxxx> · Fri, 8 Sep 2017 10:50:36 +0200

FYI I set up replica 3 (no arbiter this time), did the same thing -
rebooted one node during lots of file IO on VM and IO stopped.

As I mentioned either here or in another thread, this behavior is
caused by high default of network.ping-timeout. My main problem used
to be that setting it to low values like 3s or even 2s did not prevent
the FS to be mounted as read-only in the past (at least with arbiter)
and docs describe reconnect as very costly. If I set ping-timeout to
1s disaster of read-only mount is now prevented.

However I find it very strange because in the past I actually did end
up with read-only filesystem despite of the low ping-timeout.

With replica 3 after node reboot iftop shows data flowing only to the
one of remaining two nodes and there is no entry in heal info for the
volume. Explanation would be very much appreciated ;-)

Few minutes later I reverted back to replica 3 with arbiter (group
virt, ping-timeout 1). All nodes are up. During first fio run, VM
disconnected my ssh session, so I reconnected and saw ext4 problems in
dmesg. I deleted the VM and started a new one. Glustershd.log fills
with metadata heal shortly after fio job starts, but this time system
is stable.
Rebooting one of the nodes does not cause any problem (watching heal
log, i/o on vm).

So I decided to put more stress on VMs disk - I added second job with
direct=1 and started it (now both are running) while one gluster node
is still booting. What happened? One fio job reports "Bus error" and
VM segfaults when trying to run dmesg...

Is this gfapi related? Is this bug in arbiter?
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users