I agree with this assessment for the most part. I'll just add that, during development of Gluster based solutions, we had internal use of Redhat Gluster. This was over a year and a half ago when we started. For my perhaps non-mainstream use cases, I found the latest versions of gluster 7 actually fixed several of my issues. Now, I did not try to work with RedHat when I hit problems as it was only "non-shipable support" - we could install it but not deliver it. Since it didn't work well for our strange use cases, we moved on to building our own Gluster instead instead of working to have customers buy the Red Hat one. (We also support sles12, sles15, rhel7, rhel8 - so having Red Hat's version of Gluster sort of wouldn't have worked out for us anyway). However, I also found that it is quite easy for my use case to hit new bugs. When we go from gluster72 to one of the newer ones, little things might happen (and did happen). I don't complain because I get free support from you and I do my best to fix them if I have time and access to a failing system. A tricky thing in my world is we will sell a cluster with 5,000 nodes to boot and my test cluster may have 3 nodes. I can get time up to 128 nodes on one test system. But I only get short-term access to bigger systems at the factory. So being able to change from one Gluster version to another is a real challenge for us because there simply is no way for us to test very often and, like is normal in HPC, problems only show at scale. hahaa :) :) This is also why we are still using Gluster NFS. We know we need to work with the community on fixing some Ganesha issues, but the amount of time we get on a large machine that exhibits the problem is short and we must prioritize. This is why I'm careful to never "blame Ganesha" but rather point out that we haven't had time to track the issues down with the Ganesha community. Meanwhile we hope we can keep building Gluster NFS :) When I next do a version-change of Gluster or try Ganesha again, it will be when I have sustained access to at least a 1024 node cluster to boot with 3 or 6 Gluster servers to really work out any issues. I consider this "a cost of doing business in the world I work in" but it is a real challenge indeed. I assume some challenges parallel Gluster developers.... "Works fine on my limited hardware or virtual machines". Erik > With every community project , you are in the position of a Betta Tester - no matter Fedora, Gluster or CEPH. So far , I had issues with upstream projects only diring and immediately after patching - but this is properly mitigated with a reasonable patching strategy (patch test environment and several months later patch prod with the same repos). > Enterprise Linux breaks (and alot) having 10-times more users and use cases, so you cannot expect to start to use Gluster and assume that a free peoject won't break at all. > Our part in this project is to help the devs to create a test case for our workload , so regressions will be reduced to minimum. > > In the past 2 years, we got 2 major issues with VMware VSAN and 1 major issue with a Enterprise Storage cluster (both solutions are quite expensive) - so I always recommend proper testing of your software . > > > >> That's true, but you could also use NFS Ganesha, which is > >> more performant than FUSE and also as reliable as it. > > > >From this very list I read about many users with various problems when > >using NFS Ganesha. Is that a wrong impression? > > >From my observations, almost nobody is complaining about Ganesha in the mailing list -> 50% are having issues with geo replication,20% are having issues with small file performance and the rest have issues with very old version of gluster -> v5 or older. > > >> It's not so hard to do it - just use either 'reset-brick' or > >> 'replace-brick' . > > > >Sure - the command itself is simple enough. The point it that each > >reconstruction is quite more "riskier" than a simple RAID > >reconstruction. Do you run a full Gluster SDS, skipping RAID? How do > >you > >found this setup? > > I can't say that a replace-brick on a 'replica 3' volume is more riskier than a rebuild of a raid, but I have noticed that nobody is following Red Hat's guide to use either: > - a Raid6 of 12 Disks (2-3 TB big) > - a Raid10 of 12 Disks (2-3 TB big) > - JBOD disks in 'replica 3' mode (i'm not sure about the size RH recommends, most probably 2-3 TB) > So far, I didn' have the opportunity to run on JBODs. > > > >Thanks. > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users@xxxxxxxxxxx > https://lists.gluster.org/mailman/listinfo/gluster-users ________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users