Setup recommendations

Nico van Royen <nico@xxxxxxxxxxxx> · Fri, 16 Oct 2020 09:05:37 +0200 (CEST)

Gluster community:

Due to some recent issues with performance from one of our (internal) clients that uses several GlusterFS setups, this is mainly an open question for generic possible improvements.
(apart from the fact the users themselves do some 'less smart' things on it....)

The setup used:  a 3 node (replica=3) RHEL7 gluster (with RHGS, so currently that is GlusterFS 6.0-37.1.el7rhgs).  Each Gluster has 1 volume, exported through NFS-Ganesha.
Each node also has a virtual IP, managed by pacemaker/corosync following standard RedHat setup documentation
Size is not that big, 600GB space with around half of that actually used.  GlusterFS servers themselves each have 4 cores and 12GB memory.  It might also be important to note that these are VMware hosted nodes that make use of  SAN storage for the datastores.

Connected to that NFS (ganesha) exported share are just over 100 clients, all RHEL6 and RHEL7, some spanning 10 network hops away.  All of those clients are (currently) using the same virtual-IP, so all end up on the same server.
(we did already advise them to spread that across the three servers).
Certain subfolders of the share hold (at times) large numbers of (small) files that *should* peak at around 50.000 files, into a single, unhashed, directory (this will of course make simple ls and find commands via NFS quite slow).
Note that I mentioned 'should', since at times it had anywhere between 250.000 and 1 million files in it (which of course is not advised).  Using some kind of hashing (subfolders spread per day/hour etc) was also already advised.

Problems that are often seen:
- Any kind of operation on VMware such as a vMotion, creating a VM snapshot etc. on the node that has these 100+ clients connected causes such a temporary pause that pacemaker decides to switch the resources (causing a failover of the virtual IP address, thus clients connected suffer delay).  One would expect this to last just shy under a minute, then clients would happily continue.  However connected clients are stuck with a non-working mountpoint (commands as df, ls, find etc simply hang.. they go into an uninterruptible sleep).
Mount are 'hard' mounts to insure guaranteed writes.
- Once the number of files are over the 100.000 mark (again into a single, unhashed, folder) any operation on that share becomes very sluggish (even a df, on a client, would take 20/30 seconds,  a find command would take minutes to complete).

If anyone can spot any ideas for improvement ?

Some config info (below is from a sandbox setup using the same values as the affected gluster):
For Ganesha:
/etc/ganesha/ganesha.conf:
# BEGIN ANSIBLE MANAGED BLOCK
NFSv4 {
minor_versions = 0;
}
# END ANSIBLE MANAGED BLOCK
%include "/var/run/gluster/shared_storage/nfs-ganesha/exports/export.BLAH.conf"

/var/run/gluster/shared_storage/nfs-ganesha/exports/export.BLAH.conf:
EXPORT{
      Export_Id = 2;
      Path = "/BLAH";
      FSAL {
           name = GLUSTER;
           hostname="localhost";
          volume="BLAH";
           }
      Access_type = RW;
      Disable_ACL = true;
      Squash="No_root_squash";
      Pseudo="/BLAH";
      Protocols = "3", "4" ;
      Transports = "UDP","TCP";
      SecType = "sys";
     }

# gluster v info BLAH
Volume Name: BLAH
Type: Replicate
Volume ID: 6fee713c-4258-44d8-a849-f8d6b2991631
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: tlrvrhgluster03:/gluster/BLAH/export
Brick2: tlrvrhgluster02:/gluster/BLAH/export
Brick3: tlrvrhgluster01:/gluster/BLAH/export
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
ganesha.enable: on
features.cache-invalidation: on
performance.client-io-threads: off
nfs.disable: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
cluster.server-quorum-type: server
cluster.quorum-count: 2
performance.cache-refresh-timeout: 10
cluster.quorum-type: fixed
cluster.enable-shared-storage: enable
nfs-ganesha: enable

# lvs -a
  LV              VG       Attr       LSize   Pool    Origin Data%  Meta%  Move Log Cpy%Sync Convert
  BLAH            gfsvg    Vwi-aotz--  10.00g gfspool        36.45
  gfspool         gfsvg    twi-aotz--  48.00g                7.59   1.76
  [gfspool_tdata] gfsvg    Twi-ao----  48.00g
  [gfspool_tmeta] gfsvg    ewi-ao----   1.00g
  [lvol0_pmspare] gfsvg    ewi-------  48.00m
  auditlv         systemvg -wi-ao---- 252.00m
  homelv          systemvg -wi-ao----   1.00g
  rootlv          systemvg -wi-ao----  16.00g
  swaplv          systemvg -wi-ao----   2.00g
  tmplv           systemvg -wi-ao----   2.00g
  varcorelv       systemvg -wi-ao----   1.00g
  varloglv        systemvg -wi-ao----   6.00g
  varlv           systemvg -wi-ao----   6.00g

________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users