Re: State of Gluster project

Erik Jacobson <erik.jacobson@xxxxxxx> · Wed, 17 Jun 2020 08:30:10 -0500

We never ran tests with Ceph mostly due to time constraints in
engineering. We also liked that, at least when I started as a novice,
gluster seemed easier to set up. We use the solution in automated
setup scripts for maintaining very large clusters. Simplicity in
automated setup is critical here for us including automated installation
of supercomputers in QE and near-automation at customer sites.

We have been happy with our performance using gluster and gluster NFS
for root filesystems when using squashfs object files for the NFS roots
as opposed to expanded files (on a sharded volume). For writable NFS, we
use XFS filesystem images on gluster NFS instead of expanded trees (in
this case, not on sharded volume).

We have systems running as large as 3072 nodes with 16 gluster servers
(subvolumes of 3, distributed/replicate).

We will have 5k nodes in production soon and will need to support 10k
nodes in a year or so. So far we use CTDB for "ha-like" functionality as
pacemaker is scary to us.

We also have designed a second solution around gluster for
high-availability head nodes (aka admin nodes). The old solution used two
admin nodes, pacemaker, external shared storage, to host a VM that would
start on the 2nd server if the first server died. As we know, 2-node ha
is not optimal. We designed a new 3-server HA solution that eliminates
the external shared storage (which was expensive) and instead uses
gluster, sharded volume, and a qemu raw image hosted in the shared
storage to host the virtual admin node.  We use RAIDD10 4-disk per
server for gluster use in this. We have been happy with the performance
of this. It's only a little slower than the external shared filesystem
solution (we tended to use GFS2 or OCFS or whatever it is called in the
past solution). We did need to use pacemaker for this one as virtual
machine availability isn't suitable for CTDB (or less natural anyway).
One highlight of this solution is it allows a customer to put each of
the 3 servers in a separate firewalled vault or room to keep the head 
alive even if there were a fire that destroyed one server.

A key to our use of gluster and not suffering from poor performance in
our root-filesystem-workloads is encapsulating filesystems in image
files instead of using expanded trees of small files.

So far we have relied on gluster NFS for the boot servers as Ganesha
would crash. We haven't re-tried in several months though and owe
debugging on that front. We have not had resources to put in to
debugging Ganesha just yet.

I sure hope Gluster stays healthy and active. It is good to have
multiple solutions with various strengths out there. I like choice.
Plus, choice lets us learn from each other. I hope project sponsors see
that too.

Erik

> 17.06.2020 08:59, Artem Russakovskii пишет:
> > It may be stable, but it still suffers from performance issues, which
> > the team is working on. But nevertheless, I'm curious if maybe Ceph has
> > those problem sorted by now.
> 
> 
> Dunno, we run gluster on small clusters, kvm and gluster on the same hosts.
> 
> There were plans to use ceph on dedicated server next year, but budget cut
> because you don't want to buy our oil for $120 ;-)
> 
> Anyway, in our tests ceph is faster, this is why we wanted to use it, but
> not migrate from gluster.
> 
> 
> ________
> 
> 
> 
> Community Meeting Calendar:
> 
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://bluejeans.com/441850968
> 
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> https://lists.gluster.org/mailman/listinfo/gluster-users
________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users