Jacques Mattheij wrote:
Hello there gluster developers and users, I'm trying to get a handle on what it takes to get glusterfs to work reliable. After several weeks of testing we have to date not been able to get it to work stable in our setup, and I'm beginning to wonder if there is a possible statistical approach to finding out what works and what doesn't rather than to try to go about it one bug at a time.
I've found that for high-availability, I get FAR more stable and fault-tolerant setups with all unify and afr done on the client. The unify translator seems to handle server failures quite gracefully, while relying on round-robin DNS as with the high Availability Storage Example in the GlusterFS wiki leads to transport disconnected errors for me. When it's all done on the client side, there's just a short delay (defined by the transport-timeout value) for the client the first time it reads or writes after a server is failed and then the operation continues as normal.