> looking through the last couple of week on this mailing list and reflecting our own experiences, I have to ask: what is the status of GlusterFS? So many people here reporting bugs and no solutions are in sight. GlusterFS clusters break left and right, reboots of a node have become a warrant for instability and broken clusters, no way to fix broken clusters. And all of that with recommended settings, and in our case, enterprise hardware underneath. I have been one of the people asking questions. I sometimes get an answer, which I appreciate. Other times not. But I'm not paying for support in this forum so I appreciate what I can get. My questions are sometimes very hard to summarize and I can't say I've been offering help as much as I ask. I think I will try to do better. Just to counter with something cool.... As we speak now, I'm working on a 2,000 node cluster that will soon be a 5120 node cluster. We're validating it with the newest version of our cluster manager. It has 12 leader nodes (soon to have 24) that are gluster servers and gnfs servers. I am validating Gluster7.2 (updating from 4.6). Things are looking very good. 5120 nodes using RO NFS root with RW NFS overmounts (for things like /var, /etc, ...)... - boot 1 (where each node creates a RW XFS image on top of NFS for its writable area then syncs /var, /etc, etc) -- full boot is 15-16 minutes for 2007 nodes. - boot 2 (where the writable area pre-exists and is reused, just re-rsynced) -- 8-9 minutes to boot 2007 nodes. This is similar to gluster 4, but I think it's saying something to not have had any failures in this setup on the bleeding edge release level. We also use a different volume shared between the leaders and the head node for shared-storage consoles and system logs. It's working great. I haven't had time to test other solutions. Our old solution from SGI days (ICE, ICE X, etc) was a different model where each leader served a set of nodes and NFS-booted 288 or so. No shared storage. Like you, I've wondered if something else matches this solution. We like the shared storage and the ability for a leader to drop and not take 288 noes with it. (All nodes running RHEL8.0, Glusterfs 72, CTDB 4.9.1) So we can say gluster is providing the network boot solution for now two supercomputers. Erik ________ Community Meeting Calendar: APAC Schedule - Every 2nd and 4th Tuesday at 11:30 AM IST Bridge: https://bluejeans.com/441850968 NA/EMEA Schedule - Every 1st and 3rd Tuesday at 01:00 PM EDT Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users