Re: State of Gluster project

Erik Jacobson <erik.jacobson@xxxxxxx> · Sun, 21 Jun 2020 10:35:34 -0500

I agree with this assessment for the most part. I'll just add that,
during development of Gluster based solutions, we had internal use of
Redhat Gluster. This was over a year and a half ago when we started.
For my perhaps non-mainstream use cases, I found the latest versions of
gluster 7 actually fixed several of my issues. Now, I did not try to
work with RedHat when I hit problems as it was only "non-shipable
support" - we could install it but not deliver it. Since it didn't work
well for our strange use cases, we moved on to building our own Gluster
instead instead of working to have customers buy the Red Hat one.
(We also support sles12, sles15, rhel7, rhel8 - so having Red Hat's
version of Gluster sort of wouldn't have worked out for us anyway).

However, I also found that it is quite easy for my use case to hit new bugs.
When we go from gluster72 to one of the newer ones, little things might
happen (and did happen). I don't complain because I get free support
from you and I do my best to fix them if I have time and access to a
failing system.

A tricky thing in my world is we will sell a cluster with 5,000 nodes to
boot and my test cluster may have 3 nodes. I can get time up to 128
nodes on one test system. But I only get short-term access to bigger systems
at the factory. So being able to change from one Gluster version to another is
a real challenge for us because there simply is no way for us to test
very often and, like is normal in HPC, problems only show at scale.
hahaa :) :)

This is also why we are still using Gluster NFS. We know we need to work
with the community on fixing some Ganesha issues, but the amount of time
we get on a large machine that exhibits the problem is short and we must
prioritize. This is why I'm careful to never "blame Ganesha" but rather
point out that we haven't had time to track the issues down with the
Ganesha community. Meanwhile we hope we can keep building Gluster NFS :)

When I next do a version-change of Gluster or try Ganesha again, it will be
when I have sustained access to at least a 1024 node cluster to boot with
3 or 6 Gluster servers to really work out any issues.

I consider this "a cost of doing business in the world I work in" but it
is a real challenge indeed. I assume some challenges parallel Gluster
developers.... "Works fine on my limited hardware or virtual machines".

Erik

> With  every community project ,  you are in the position  of a Betta  Tester  - no matter Fedora,  Gluster  or CEPH. So far  ,  I had  issues with upstream  projects only diring and immediately after patching  - but this is properly mitigated  with a  reasonable patching strategy (patch  test environment and several months later  patch prod with the same repos).
> Enterprise  Linux breaks (and alot) having 10-times more  users and use  cases,  so you cannot expect to start to use  Gluster  and assume that a  free  peoject won't break at all.
> Our part in this project is to help the devs to create a test case for our workload ,  so  regressions will be reduced to minimum.
> 
> In the past 2  years,  we  got 2  major  issues with VMware VSAN and 1  major  issue  with  a Enterprise Storage cluster (both solutions are quite  expensive)  - so  I always recommend proper  testing  of your  software .
> 
> 
> >> That's  true,  but  you  could  also  use  NFS Ganesha,  which  is
> >> more  performant  than FUSE and also as  reliable  as  it.
> >
> >From this very list I read about many users with various problems when 
> >using NFS Ganesha. Is that a wrong impression?
> 
> >From my observations,  almost nobody  is complaining about Ganesha in the mailing list -> 50% are  having issues  with geo replication,20%  are  having issues with small file performance and the rest have issues with very old version of gluster  -> v5 or older.
> 
> >> It's  not so hard to  do it  -  just  use  either  'reset-brick' or
> >> 'replace-brick' .
> >
> >Sure - the command itself is simple enough. The point it that each 
> >reconstruction is quite more "riskier" than a simple RAID 
> >reconstruction. Do you run a full Gluster SDS, skipping RAID? How do
> >you 
> >found this setup?
> 
> I  can't say that a  replace-brick  on a 'replica  3' volume is more  riskier  than a rebuild  of a raid,  but I have noticed that nobody is  following Red Hat's  guide  to use  either:
> -  a  Raid6  of 12  Disks (2-3  TB  big)
> -  a Raid10  of  12  Disks (2-3  TB big)
> -  JBOD disks in 'replica  3' mode (i'm not sure about the size  RH recommends,  most probably 2-3 TB)
>  So far,  I didn' have the opportunity to run on JBODs.
> 
> 
> >Thanks.
> ________
> 
> 
> 
> Community Meeting Calendar:
> 
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://bluejeans.com/441850968 
> 
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> https://lists.gluster.org/mailman/listinfo/gluster-users 
________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users