Re: Hosted VM Pause when one node of gluster goes down

Artem Russakovskii <archon810@xxxxxxxxx> · Tue, 24 Apr 2018 11:07:39 -0700

Hi Russell,
Since I also ran into this when setting up gluster, the solution is to tweak network.ping-timeout to a lower value (default is 42 seconds). If a node goes down and starts timing out, the whole cluster will attempt to block access, including reads, for network.ping-timeout seconds and only let them through after.
I set mine to 5 (seconds) because 42 is nowhere near an acceptable wait value.

Note: This only happens when the node goes offline for the first time. After that, the file accesses are not blocked and it's expected that when the node comes back online, it'll ring back and reconnect.

More on this:
https://serverfault.com/questions/619355/how-to-lower-gluster-fs-down-peer-timeout-reduce-down-peer-impact
https://thornelabs.net/2015/02/24/change-gluster-volume-connection-timeout-for-glusterfs-native-client.html

Sincerely,
Artem

--
Founder, Android Police, APK Mirror, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii | @ArtemR

Date: Tue, 24 Apr 2018 09:45:39 +0800 (PHT)
From: rwecker@xxxxxxx
To: gluster-users <gluster-users@xxxxxxxxxxx>
Subject:  Hosted VM Pause when one node of gluster goes
        down
Message-ID: <349486642.163403.1524534339074.JavaMail.zimbra@ssd.org>
Content-Type: text/plain; charset="utf-8"

HI, 

I have a 3 node hyper-converged cluster running glusterfs with 3 replica 1 arbiter volumes. When I Shutdown 1 node i am having problems with high load VM's pausing due to storage error.what areas should i look in to get this to work? 

Russell Wecker 

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users