Hi Russell,
Since I also ran into this when setting up gluster, the solution is to tweak network.ping-timeout to a lower value (default is 42 seconds). If a node goes down and starts timing out, the whole cluster will attempt to block access, including reads, for network.ping-timeout seconds and only let them through after.
I set mine to 5 (seconds) because 42 is nowhere near an acceptable wait value.
Note: This only happens when the node goes offline for the first time. After that, the file accesses are not blocked and it's expected that when the node comes back online, it'll ring back and reconnect.
More on this:
https://thornelabs.net/2015/02/24/change-gluster-volume-connection-timeout-for-glusterfs-native-client.html
Date: Tue, 24 Apr 2018 09:45:39 +0800 (PHT)
From: rwecker@xxxxxxx
To: gluster-users <gluster-users@xxxxxxxxxxx>
Subject: Hosted VM Pause when one node of gluster goes
down
Message-ID: <349486642.163403.1524534339074.JavaMail.zimbra@ >ssd.org
Content-Type: text/plain; charset="utf-8"
HI,
I have a 3 node hyper-converged cluster running glusterfs with 3 replica 1 arbiter volumes. When I Shutdown 1 node i am having problems with high load VM's pausing due to storage error.what areas should i look in to get this to work?
Russell Wecker
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users