Re: GlusterFS as virtual machine storage

WK <wkmail@xxxxxxxxx> · Fri, 8 Sep 2017 17:35:06 -0700

Pavel.

Is there a difference between native client (fuse) and libgfapi in 
regards to the crashing/read-only behaviour?

We use Rep2 + Arb and can shutdown a node cleanly, without issue on our 
VMs. We do it all the time for upgrades and maintenance.

However we are still on native client as we haven't had time to work on 
libgfapi yet. Maybe that is more tolerant.

We have linux VMs mostly with XFS filesystems.

During the downtime, the VMs continue to run with normal speed.

In this case we migrated to the VM so date node 2 (c2g.gluster) and 
shutdown c1g.gluster to do some upgrades.

# gluster peer status
Number of Peers: 2

Hostname: c1g.gluster
Uuid: 91be2005-30e6-462b-a66e-773913cacab6
State: Peer in Cluster (Disconnected)
Hostname: arb-c2.gluster
Uuid: 20862755-e54e-4b79-96a8-59e78c6a6a2e
State: Peer in Cluster (Connected)

# gluster volume status
Status of volume: brick1
Gluster process                             TCP Port  RDMA Port Online  Pid
------------------------------------------------------------------------------
Brick c2g.gluster:/GLUSTER/brick1       49152     0 Y       5194
Brick arb-c2.gluster:/GLUSTER/brick1        49152     0 Y       3647
Self-heal Daemon on localhost               N/A       N/A Y       5214
Self-heal Daemon on arb-c2.gluster          N/A       N/A Y       3667

Task Status of Volume brick1
------------------------------------------------------------------------------
There are no active volume tasks

When we return the c1g node, we do see a "pause" in the VMs as the 
shards heal. By pause meaning a terminal session gets spongy, but that 
passes pretty quickly.

Also are your VMs mounted in libvirt with caching? We always use 
cache='none' so we can migrate around easily.

Finally, you seem to be using oVirt/RHEV. Is it possible that your platform is triggering a protective response on the VMs (by suspending).

-wk

On 9/8/2017 5:13 AM, Gandalf Corvotempesta wrote:
2017-09-08 14:11 GMT+02:00 Pavel Szalbot <pavel.szalbot@xxxxxxxxx>:
Gandalf, SIGKILL (killall -9 glusterfsd) did not stop I/O after few
minutes. SIGTERM on the other hand causes crash, but this time it is
not read-only remount, but around 10 IOPS tops and 2 IOPS on average.
-ps
So, seems to be reliable to server crashes but not to server shutdown :)
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users