We run gluster 3.7 in a distributed replicated setup. Infiniband (tcp) links the gluster peers together and clients use the ethernet interface.
This setup is stable running CentOS 6.x and using the most recent infiniband drivers provided by Mellanox. Uptime was 170 days when we took it down to wipe the systems and update to CentOS 7. When the exact same setup is loaded onto a CentOS 7 machine (minor setup differences, but basically the same; setup is handled by ansible), the peers will (seemingly randomly) experience a hard crash and need to be power-cycled. There is no output on the screen and nothing in the logs. After rebooting, the peer reconnects, heals whatever files it missed, and everything is happy again. Maximum uptime for any given peer is 20 days. Thanks to the replication, clients maintain connectivity, but from a system administration perspective it's driving me crazy!
Thanks,
Patrick_______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel