Re: Glusterfs 10.5-1 healing issues

Darrell Budic <budic@xxxxxxxxxxxxxxxx> · Wed, 10 Apr 2024 11:40:58 -0500

I would strongly recommend running the glusterfs servers directly on bare metal instead of in VMs. Check out Ovirt, especially its hybrid cluster model. While it’s not currently well maintained, it works fine on your class of hardware and fully supports this model of gluster on the bare metal and VMs running on the same hosts. And we may see it get some more support after the VMWare buyout, who knows?

Gluster isn’t known for small file performance, but hunt through the archives for specific tuning hints. And if you’re using it to host the VM image files, you’re making that problem because the files shared by gluster are large. More cache and write (behind) buffers can help, and 10G or better networking would be something you want to do if you can afford it. Going to 2x1G LAGs can help a tiny bit, but you really want the lower latency from a faster physical media if you can get it.

If you are not already using tuned to set virtual-guest profiles on your VMs (and virtual-host on the hosts), I’d look into that as well. Set the disk elevator to ’none’ on the VMs as well.

On Apr 10, 2024, at 10:07 AM, Ilias Chasapakis forumZFD <chasapakis@xxxxxxxxxxx> wrote:

  Dear Darrell,
Dear ...,

      Many thanks for the prompt reply. Here some of the additional
      information requested (please feel free to ask for more if needed)

CPU info:

        Hosts

        1. Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz (20 cores) hw RAID
        1 (adaptec)

        2. Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz (20 cores) hw RAID
        1 (adaptec)

        3. Intel(R) Xeon(R) Silver 4112 CPU @ 2.60GHz (8 cores) hw RAID
        1 (adaptec)

        GlusterFS VMs

        1. 4 cores,  10 GB RAM, CPU model Broadwell

        2. 4 cores,  10 GB RAM, CPU model Broadwell

        3. 4 cores,  10 GB RAM, CPU model: host passthrough value

        Network info

        Physical connection between gluster nodes in a heartbeat network
        that comprises a new Cisco switch.

        TCP connections with 1Gbit links

        Virtual default connectivity with virtio drivers for the NICs
        and Macvtap connection to use the host´s connectivity.

        No errors or lost packages are recorded between the VMs or the
        hosts. Quick iperf tests (between glusters and ctdbs-glusters
        show now evident issues).

        Workload:

        An instantaneous from this moment which can be considered a peak
        time is around 450 files open on the volume.

        In terms of "litteral" load we notice cpu peaks mostly related
        to the shd process

      All disks use virtIO drivers (virtIO disks).

      The file system on all nodes is XFS (not ZFS)

      Other than the clients on the gluster nodes themselves there are
      clients on ctdbs that mount the gluster volume and then expose it
      via smb to Windows clients (user profiles included for roaming
      profiles).

      ctdbs reach the glusters through the heartbeat network

      We are considering to move the glusters to a network with existing
      DNS capabilities in order to create a round-robin configuration by
      assigning hosts by IP to a single hostname to use it then for the
      mounts configuration of the ctdbs.

      The reasoning/hope behind that we would minimize access time and
      sync issues.

      Thank you for the information about the "sharding" we will take
      this into account and consider pros and cons in the current
      situation, epsecially because turning back is not easy afterwards.
      Also our main problem is mainly not with big files, but with a
      large quantity of small files.

      We could gladly make use of some of the options you suggested
      after we assess the situation again. We welcome any further
      suggestion in the meantime.
Ilias

    Am 09.04.24 um 18:26 schrieb Darrell
      Budic:

      The big one I see of you is to investigate and enable sharding. It
      can improve performance and makes it much easier to heal VM style
      workloads. Be aware that once you turn it on, you can’t go back
      easily, and you need to copy the VM disk images around to get them
      to be sharded before it will show any real effect. A couple other
      recommendations from my main volume (three dedicated host servers
      with HDDs and SDD/NVM caching and log volumes on ZFS ).
      The cluster.shd-* entries are especially recommended. This is on
      gluster 9.4 at the moment, so some of these won’t map exactly.

      Volume
            Name: gv1
Type:
            Replicate
Number
          of Bricks: 1 x 3 = 3
Transport-type:
            tcp
Options
          Reconfigured:
cluster.read-hash-mode:
            3
performance.client-io-threads:
            on
performance.write-behind-window-size:
            64MB
performance.cache-size:
            1G
nfs.disable:
          on
performance.readdir-ahead:
            on
performance.quick-read:
            off
performance.read-ahead:
            on
performance.io-cache:
            off
performance.stat-prefetch:
            on
cluster.eager-lock:
            enable
network.remote-dio:
            enable
server.event-threads:
          4
client.event-threads:
            8
performance.io-thread-count:
            64
performance.low-prio-threads:
            32
features.shard:
            on
features.shard-block-size:
            64MB
cluster.locking-scheme:
            granular
cluster.data-self-heal-algorithm:
          full
cluster.shd-max-threads:
            8
cluster.shd-wait-qlength:
            10240
cluster.choose-local:
            false
cluster.granular-entry-heal:
            enable

        Otherwise, more details about your servers, CPU, RAM, and
          Disks would be useful for suggestions, and details of your
          network as well. And if you haven’t done kernel level tuning
          on the servers, you should address that as well. These all
          vary a lot by your work load and hardware setup, so there
          aren’t many generic recommendations I can give other than to
          make sure you tuned your tcp stack and enabled the none disk
          elevator on SSDs or disks used by ZFS. 

        There’s a lot of tuning suggesting in the archives if you
          go searching as well.

          -Darrell

            On Apr 9, 2024, at 3:05 AM, Ilias Chasapakis forumZFD
              <chasapakis@xxxxxxxxxxx> wrote:

              Dear all,

                we would like to describe the situation that we have and
                that does not solve since a long time, that means after
                many minor

                and major upgrades of GlusterFS

                We use a KVM environment for VMs for glusterfs and host
                servers are updated regularly. Hosts are disomogeneous
                hardware,

                but configured with same characteristics.

                The VMs have been also harmonized to use the virtio
                drivers where available for devices and resources
                reserved are the same

                on each host.

                Physical switch for hosts has been substituted with a
                reliable one.

                Probing peers has been and is quite quick in the
                heartbeat network and communication between the servers
                for apparently has no issues on disruptions.

                And I say apparently because what we have is:

                - always pending failed heals that used to resolve by a
                rotated reboot of the gluster vms (replica 3).
                Restarting only

                glusterfs related services (daemon, events etc.) has no
                effect, only reboot brings results

                - very often failed heals are directories

                We lately removed a brick that was on a vm on a host
                that has been entirely substituted. Re-added the brick,
                sync went on and

                all data was eventually synced and started with 0
                pending failed heals. Now it develops failed heals too
                like its fellow

                bricks. Please take into account we healed all the
                failed entries (manually with various methods) before
                adding the third brick.

                After some days of operating, the count of failed heals
                rises again, not really fast but with new entries for
                sure (which might solve

                with rotated reboots, or not).

                We have gluster clients also on ctdbs that connect to
                the gluster and mount via glusterfs client. Windows
                roaming profiles shared via smb become frequently
                corrupted,(they are composed of a great number small
                files and are though of big total dimension). Gluster
                nodes are formatted with xfs.

                Also what we observer is that mounting with the vfs
                option in smb on the ctdbs has some kind of delay. This
                means that you can see the shared folder on for example

                a Windows client machine on a ctdb, but not on another
                ctdb in the cluster and then after a while it appears
                there too. And this frequently st

                This is an excerpt of entries on our shd logs:

                2024-04-08 10:13:26.213596
                  +0000] I [MSGID: 108026]
                  [afr-self-heal-entry.c:1080:afr_selfheal_entry_do]
                  0-gv-ho-replicate-0: performing full entry selfheal on
                  2c621415-6223-4b66-a4ca-3f6f267a448d

                  [2024-04-08 10:14:08.135911 +0000] W [MSGID: 114031]
                  [client-rpc-fops_v2.c:2457:client4_0_link_cbk]
                  0-gv-ho-client-5: remote operation failed.
                  [{source=<gfid:91d83f0e-1864-4ff3-9174-b7c956e20596>},
                  {target=(null)}, {errno=116}, {error=Veraltete
                  Dateizugriffsnummer (file handle)}]

                  [2024-04-08 10:15:59.135908 +0000] W [MSGID: 114061]
                  [client-common.c:2992:client_pre_readdir_v2]
                  0-gv-ho-client-5: remote_fd is -1. EBADFD
                  [{gfid=6b5e599e-c836-4ebe-b16a-8224425b88c7},
                  {errno=77}, {error=Die Dateizugriffsnummer ist in
                  schlechter Verfassung}]

                  [2024-04-08 10:30:25.013592 +0000] I [MSGID: 108026]
                  [afr-self-heal-entry.c:1080:afr_selfheal_entry_do]
                  0-gv-ho-replicate-0: performing full entry selfheal on
                  24e82e12-5512-4679-9eb3-8bd098367db7

                  [2024-04-08 10:33:17.613594 +0000] W [MSGID: 114031]
                  [client-rpc-fops_v2.c:2457:client4_0_link_cbk]
                  0-gv-ho-client-5: remote operation failed.
                  [{source=<gfid:ef9068fc-a329-4a21-88d2-265ecd3d208c>},
                  {target=(null)}, {errno=116}, {error=Veraltete
                  Dateizugriffsnummer (file handle)}]

                  [2024-04-08 10:33:21.201359 +0000] W [MSGID: 114031]
                  [client-rpc-fops_v2.c:2457:client4_0_link_cbk]
                  0-gv-ho-client-5: remote operation failed. [{source=

                How are he clients mapped to real hosts in order to know
                on which one´s logs to look at?

                We would like to go by exclusion to finally eradicate
                this, possibly in a conservative way (not rebuilding
                everything) and we

                are becoming clueless as to where to look at as we also
                tried various options settings regarding performance
                etc.

                Here is the set on our main volume:

                cluster.lookup-unhashed                 
                  on (DEFAULT)

                  cluster.lookup-optimize                  on (DEFAULT)

                  cluster.min-free-disk                    10% (DEFAULT)

                  cluster.min-free-inodes                  5% (DEFAULT)

                  cluster.rebalance-stats                  off (DEFAULT)

                  cluster.subvols-per-directory            (null)
                  (DEFAULT)

                  cluster.readdir-optimize                 off (DEFAULT)

                  cluster.rsync-hash-regex                 (null)
                  (DEFAULT)

                  cluster.extra-hash-regex                 (null)
                  (DEFAULT)

                  cluster.dht-xattr-name                  
                  trusted.glusterfs.dht (DEFAULT)

                  cluster.randomize-hash-range-by-gfid     off (DEFAULT)

                  cluster.rebal-throttle                   normal
                  (DEFAULT)

                  cluster.lock-migration off

                  cluster.force-migration off

                  cluster.local-volume-name                (null)
                  (DEFAULT)

                  cluster.weighted-rebalance               on (DEFAULT)

                  cluster.switch-pattern                   (null)
                  (DEFAULT)

                  cluster.entry-change-log                 on (DEFAULT)

                  cluster.read-subvolume                   (null)
                  (DEFAULT)

                  cluster.read-subvolume-index             -1 (DEFAULT)

                  cluster.read-hash-mode                   1 (DEFAULT)

                  cluster.background-self-heal-count       8 (DEFAULT)

                  cluster.metadata-self-heal on

                  cluster.data-self-heal on

                  cluster.entry-self-heal on

                  cluster.self-heal-daemon enable

                  cluster.heal-timeout                     600 (DEFAULT)

                  cluster.self-heal-window-size            8 (DEFAULT)

                  cluster.data-change-log                  on (DEFAULT)

                  cluster.metadata-change-log              on (DEFAULT)

                  cluster.data-self-heal-algorithm         (null)
                  (DEFAULT)

                  cluster.eager-lock                       on (DEFAULT)

                  disperse.eager-lock                      on (DEFAULT)

                  disperse.other-eager-lock                on (DEFAULT)

                  disperse.eager-lock-timeout              1 (DEFAULT)

                  disperse.other-eager-lock-timeout        1 (DEFAULT)

                  cluster.quorum-type auto

                  cluster.quorum-count 2

                  cluster.choose-local                     true
                  (DEFAULT)

                  cluster.self-heal-readdir-size           1KB (DEFAULT)

                  cluster.post-op-delay-secs               1 (DEFAULT)

                  cluster.ensure-durability                on (DEFAULT)

                  cluster.consistent-metadata              no (DEFAULT)

                  cluster.heal-wait-queue-length           128 (DEFAULT)

                  cluster.favorite-child-policy none

                  cluster.full-lock                        yes (DEFAULT)

                  cluster.optimistic-change-log            on (DEFAULT)

                  diagnostics.latency-measurement off

                  diagnostics.dump-fd-stats                off (DEFAULT)

                  diagnostics.count-fop-hits off

                  diagnostics.brick-log-level INFO

                  diagnostics.client-log-level INFO

                  diagnostics.brick-sys-log-level          CRITICAL
                  (DEFAULT)

                  diagnostics.client-sys-log-level         CRITICAL
                  (DEFAULT)

                  diagnostics.brick-logger                 (null)
                  (DEFAULT)

                  diagnostics.client-logger                (null)
                  (DEFAULT)

                  diagnostics.brick-log-format             (null)
                  (DEFAULT)

                  diagnostics.client-log-format            (null)
                  (DEFAULT)

                  diagnostics.brick-log-buf-size           5 (DEFAULT)

                  diagnostics.client-log-buf-size          5 (DEFAULT)

                  diagnostics.brick-log-flush-timeout      120 (DEFAULT)

                  diagnostics.client-log-flush-timeout     120 (DEFAULT)

                  diagnostics.stats-dump-interval          0 (DEFAULT)

                  diagnostics.fop-sample-interval          0 (DEFAULT)

                  diagnostics.stats-dump-format            json
                  (DEFAULT)

                  diagnostics.fop-sample-buf-size          65535
                  (DEFAULT)

                  diagnostics.stats-dnscache-ttl-sec       86400
                  (DEFAULT)

                  performance.cache-max-file-size 10

                  performance.cache-min-file-size          0 (DEFAULT)

                  performance.cache-refresh-timeout        1 (DEFAULT)

                  performance.cache-priority (DEFAULT)

                  performance.io-cache-size                32MB
                  (DEFAULT)

                  performance.cache-size                   32MB
                  (DEFAULT)

                  performance.io-thread-count              16 (DEFAULT)

                  performance.high-prio-threads            16 (DEFAULT)

                  performance.normal-prio-threads          16 (DEFAULT)

                  performance.low-prio-threads             16 (DEFAULT)

                  performance.least-prio-threads           1 (DEFAULT)

                  performance.enable-least-priority        on (DEFAULT)

                  performance.iot-watchdog-secs            (null)
                  (DEFAULT)

                  performance.iot-cleanup-disconnected-reqs off
                  (DEFAULT)

                  performance.iot-pass-through             false
                  (DEFAULT)

                  performance.io-cache-pass-through        false
                  (DEFAULT)

                  performance.quick-read-cache-size        128MB
                  (DEFAULT)

                  performance.cache-size                   128MB
                  (DEFAULT)

                  performance.quick-read-cache-timeout     1 (DEFAULT)

                  performance.qr-cache-timeout 600

                  performance.quick-read-cache-invalidation false
                  (DEFAULT)

                  performance.ctime-invalidation           false
                  (DEFAULT)

                  performance.flush-behind                 on (DEFAULT)

                  performance.nfs.flush-behind             on (DEFAULT)

                  performance.write-behind-window-size 4MB

                  performance.resync-failed-syncs-after-fsync off
                  (DEFAULT)

                  performance.nfs.write-behind-window-size 1MB (DEFAULT)

                  performance.strict-o-direct              off (DEFAULT)

                  performance.nfs.strict-o-direct          off (DEFAULT)

                  performance.strict-write-ordering        off (DEFAULT)

                  performance.nfs.strict-write-ordering    off (DEFAULT)

                  performance.write-behind-trickling-writes on (DEFAULT)

                  performance.aggregate-size               128KB
                  (DEFAULT)

                  performance.nfs.write-behind-trickling-writes on
                  (DEFAULT)

                  performance.lazy-open                    yes (DEFAULT)

                  performance.read-after-open              yes (DEFAULT)

                  performance.open-behind-pass-through     false
                  (DEFAULT)

                  performance.read-ahead-page-count        4 (DEFAULT)

                  performance.read-ahead-pass-through      false
                  (DEFAULT)

                  performance.readdir-ahead-pass-through   false
                  (DEFAULT)

                  performance.md-cache-pass-through        false
                  (DEFAULT)

                  performance.write-behind-pass-through    false
                  (DEFAULT)

                  performance.md-cache-timeout 600

                  performance.cache-swift-metadata         false
                  (DEFAULT)

                  performance.cache-samba-metadata on

                  performance.cache-capability-xattrs      true
                  (DEFAULT)

                  performance.cache-ima-xattrs             true
                  (DEFAULT)

                  performance.md-cache-statfs              off (DEFAULT)

                  performance.xattr-cache-list (DEFAULT)

                  performance.nl-cache-pass-through        false
                  (DEFAULT)

                  network.frame-timeout                    1800
                  (DEFAULT)

                  network.ping-timeout 20

                  network.tcp-window-size                  (null)
                  (DEFAULT)

                  client.ssl off

                  network.remote-dio                       disable
                  (DEFAULT)

                  client.event-threads 4

                  client.tcp-user-timeout 0

                  client.keepalive-time 20

                  client.keepalive-interval 2

                  client.keepalive-count 9

                  client.strict-locks off

                  network.tcp-window-size                  (null)
                  (DEFAULT)

                  network.inode-lru-limit 200000

                  auth.allow *

                  auth.reject                              (null)
                  (DEFAULT)

                  transport.keepalive 1

                  server.allow-insecure                    on (DEFAULT)

                  server.root-squash                       off (DEFAULT)

                  server.all-squash                        off (DEFAULT)

                  server.anonuid                           65534
                  (DEFAULT)

                  server.anongid                           65534
                  (DEFAULT)

                  server.statedump-path                   
                  /var/run/gluster (DEFAULT)

                  server.outstanding-rpc-limit             64 (DEFAULT)

                  server.ssl off

                  auth.ssl-allow *

                  server.manage-gids                       off (DEFAULT)

                  server.dynamic-auth                      on (DEFAULT)

                  client.send-gids                         on (DEFAULT)

                  server.gid-timeout                       300 (DEFAULT)

                  server.own-thread                        (null)
                  (DEFAULT)

                  server.event-threads 4

                  server.tcp-user-timeout                  42 (DEFAULT)

                  server.keepalive-time 20

                  server.keepalive-interval 2

                  server.keepalive-count 9

                  transport.listen-backlog 1024

                  ssl.own-cert                             (null)
                  (DEFAULT)

                  ssl.private-key                          (null)
                  (DEFAULT)

                  ssl.ca-list                              (null)
                  (DEFAULT)

                  ssl.crl-path                             (null)
                  (DEFAULT)

                  ssl.certificate-depth                    (null)
                  (DEFAULT)

                  ssl.cipher-list                          (null)
                  (DEFAULT)

                  ssl.dh-param                             (null)
                  (DEFAULT)

                  ssl.ec-curve                             (null)
                  (DEFAULT)

                  transport.address-family inet

                  performance.write-behind off

                  performance.read-ahead on

                  performance.readdir-ahead on

                  performance.io-cache off

                  performance.open-behind on

                  performance.quick-read on

                  performance.nl-cache on

                  performance.stat-prefetch on

                  performance.client-io-threads off

                  performance.nfs.write-behind on

                  performance.nfs.read-ahead off

                  performance.nfs.io-cache off

                  performance.nfs.quick-read off

                  performance.nfs.stat-prefetch off

                  performance.nfs.io-threads off

                  performance.force-readdirp               true
                  (DEFAULT)

                  performance.cache-invalidation on

                  performance.global-cache-invalidation    true
                  (DEFAULT)

                  features.uss off

                  features.snapshot-directory .snaps

                  features.show-snapshot-directory off

                  features.tag-namespaces off

                  network.compression off

                  network.compression.window-size          -15 (DEFAULT)

                  network.compression.mem-level            8 (DEFAULT)

                  network.compression.min-size             0 (DEFAULT)

                  network.compression.compression-level    -1 (DEFAULT)

                  network.compression.debug                false
                  (DEFAULT)

                  features.default-soft-limit              80% (DEFAULT)

                  features.soft-timeout                    60 (DEFAULT)

                  features.hard-timeout                    5 (DEFAULT)

                  features.alert-time                      86400
                  (DEFAULT)

                  features.quota-deem-statfs off

                  geo-replication.indexing off

                  geo-replication.indexing off

                  geo-replication.ignore-pid-check off

                  geo-replication.ignore-pid-check off

                  features.quota off

                  features.inode-quota off

                  features.bitrot disable

                  debug.trace off

                  debug.log-history                        no (DEFAULT)

                  debug.log-file                           no (DEFAULT)

                  debug.exclude-ops                        (null)
                  (DEFAULT)

                  debug.include-ops                        (null)
                  (DEFAULT)

                  debug.error-gen off

                  debug.error-failure                      (null)
                  (DEFAULT)

                  debug.error-number                       (null)
                  (DEFAULT)

                  debug.random-failure                     off (DEFAULT)

                  debug.error-fops                         (null)
                  (DEFAULT)

                  nfs.disable on

                  features.read-only                       off (DEFAULT)

                  features.worm off

                  features.worm-file-level off

                  features.worm-files-deletable on

                  features.default-retention-period        120 (DEFAULT)

                  features.retention-mode                  relax
                  (DEFAULT)

                  features.auto-commit-period              180 (DEFAULT)

                  storage.linux-aio                        off (DEFAULT)

                  storage.linux-io_uring                   off (DEFAULT)

                  storage.batch-fsync-mode                 reverse-fsync
                  (DEFAULT)

                  storage.batch-fsync-delay-usec           0 (DEFAULT)

                  storage.owner-uid                        -1 (DEFAULT)

                  storage.owner-gid                        -1 (DEFAULT)

                  storage.node-uuid-pathinfo               off (DEFAULT)

                  storage.health-check-interval            30 (DEFAULT)

                  storage.build-pgfid                      off (DEFAULT)

                  storage.gfid2path                        on (DEFAULT)

                  storage.gfid2path-separator              : (DEFAULT)

                  storage.reserve                          1 (DEFAULT)

                  storage.health-check-timeout             20 (DEFAULT)

                  storage.fips-mode-rchecksum on

                  storage.force-create-mode                0000
                  (DEFAULT)

                  storage.force-directory-mode             0000
                  (DEFAULT)

                  storage.create-mask                      0777
                  (DEFAULT)

                  storage.create-directory-mask            0777
                  (DEFAULT)

                  storage.max-hardlinks                    100 (DEFAULT)

                  features.ctime                           on (DEFAULT)

                  config.gfproxyd off

                  cluster.server-quorum-type server

                  cluster.server-quorum-ratio 51

                  changelog.changelog                      off (DEFAULT)

                  changelog.changelog-dir                  {{ brick.path
                  }}/.glusterfs/changelogs (DEFAULT)

                  changelog.encoding                       ascii
                  (DEFAULT)

                  changelog.rollover-time                  15 (DEFAULT)

                  changelog.fsync-interval                 5 (DEFAULT)

                  changelog.changelog-barrier-timeout 120

                  changelog.capture-del-path               off (DEFAULT)

                  features.barrier disable

                  features.barrier-timeout 120

                  features.trash                           off (DEFAULT)

                  features.trash-dir                       .trashcan
                  (DEFAULT)

                  features.trash-eliminate-path            (null)
                  (DEFAULT)

                  features.trash-max-filesize              5MB (DEFAULT)

                  features.trash-internal-op               off (DEFAULT)

                  cluster.enable-shared-storage disable

                  locks.trace                              off (DEFAULT)

                  locks.mandatory-locking                  off (DEFAULT)

                  cluster.disperse-self-heal-daemon        enable
                  (DEFAULT)

                  cluster.quorum-reads                     no (DEFAULT)

                  client.bind-insecure                     (null)
                  (DEFAULT)

                  features.timeout                         45 (DEFAULT)

                  features.failover-hosts                  (null)
                  (DEFAULT)

                  features.shard off

                  features.shard-block-size                64MB
                  (DEFAULT)

                  features.shard-lru-limit                 16384
                  (DEFAULT)

                  features.shard-deletion-rate             100 (DEFAULT)

                  features.scrub-throttle lazy

                  features.scrub-freq biweekly

                  features.scrub                           false
                  (DEFAULT)

                  features.expiry-time 120

                  features.signer-threads 4

                  features.cache-invalidation on

                  features.cache-invalidation-timeout 600

                  ganesha.enable off

                  features.leases off

                  features.lease-lock-recall-timeout       60 (DEFAULT)

                  disperse.background-heals                8 (DEFAULT)

                  disperse.heal-wait-qlength               128 (DEFAULT)

                  cluster.heal-timeout                     600 (DEFAULT)

                  dht.force-readdirp                       on (DEFAULT)

                  disperse.read-policy                     gfid-hash
                  (DEFAULT)

                  cluster.shd-max-threads 4

                  cluster.shd-wait-qlength                 1024
                  (DEFAULT)

                  cluster.locking-scheme                   full
                  (DEFAULT)

                  cluster.granular-entry-heal              no (DEFAULT)

                  features.locks-revocation-secs           0 (DEFAULT)

                  features.locks-revocation-clear-all      false
                  (DEFAULT)

                  features.locks-revocation-max-blocked    0 (DEFAULT)

                  features.locks-monkey-unlocking          false
                  (DEFAULT)

                  features.locks-notify-contention         yes (DEFAULT)

                  features.locks-notify-contention-delay   5 (DEFAULT)

                  disperse.shd-max-threads                 1 (DEFAULT)

                  disperse.shd-wait-qlength 4096

                  disperse.cpu-extensions                  auto
                  (DEFAULT)

                  disperse.self-heal-window-size           32 (DEFAULT)

                  cluster.use-compound-fops off

                  performance.parallel-readdir on

                  performance.rda-request-size 131072

                  performance.rda-low-wmark                4096
                  (DEFAULT)

                  performance.rda-high-wmark               128KB
                  (DEFAULT)

                  performance.rda-cache-limit 10MB

                  performance.nl-cache-positive-entry      false
                  (DEFAULT)

                  performance.nl-cache-limit 10MB

                  performance.nl-cache-timeout 600

                  cluster.brick-multiplex disable

                  cluster.brick-graceful-cleanup disable

                  glusterd.vol_count_per_thread 100

                  cluster.max-bricks-per-process 250

                  disperse.optimistic-change-log           on (DEFAULT)

                  disperse.stripe-cache                    4 (DEFAULT)

                  cluster.halo-enabled                     False
                  (DEFAULT)

                  cluster.halo-shd-max-latency             99999
                  (DEFAULT)

                  cluster.halo-nfsd-max-latency            5 (DEFAULT)

                  cluster.halo-max-latency                 5 (DEFAULT)

                  cluster.halo-max-replicas                99999
                  (DEFAULT)

                  cluster.halo-min-replicas                2 (DEFAULT)

                  features.selinux on

                  cluster.daemon-log-level INFO

                  debug.delay-gen off

                  delay-gen.delay-percentage               10% (DEFAULT)

                  delay-gen.delay-duration                 100000
                  (DEFAULT)

                  delay-gen.enable (DEFAULT)

                  disperse.parallel-writes                 on (DEFAULT)

                  disperse.quorum-count                    0 (DEFAULT)

                  features.sdfs off

                  features.cloudsync off

                  features.ctime on

                  ctime.noatime on

                  features.cloudsync-storetype             (null)
                  (DEFAULT)

                  features.enforce-mandatory-lock off

                  config.global-threading off

                  config.client-threads 16

                  config.brick-threads 16

                  features.cloudsync-remote-read off

                  features.cloudsync-store-id              (null)
                  (DEFAULT)

                  features.cloudsync-product-id            (null)
                  (DEFAULT)

                  features.acl enable

                  cluster.use-anonymous-inode yes

                  rebalance.ensure-durability              on (DEFAULT)

                Again, sorry for the long post. We would be happy to
                have this solved as we are excited using glusterfs and
                we would like to go back to having a stable
                configuration.

                We always appreciate the spirit of collaboration and
                reciprocal help on this list.

                Best

                Ilias

                -- 

                forumZFD

                Entschieden für Frieden | Committed to Peace

                Ilias Chasapakis

                Referent IT | IT Consultant

                Forum Ziviler Friedensdienst e.V. | Forum Civil Peace
                Service

                Am Kölner Brett 8 | 50825 Köln | Germany

                Tel 0221 91273243 | Fax 0221 91273299 |
                http://www.forumZFD.de

                Vorstand nach § 26 BGB,
                einzelvertretungsberechtigt|Executive Board:

                Alexander Mauz, Sonja Wiekenberg-Mlalandle, Jens von
                Bargen

                VR 17651 Amtsgericht Köln

                Spenden|Donations: IBAN DE90 4306 0967 4103 7264 00
                  BIC GENODEM1GLS

                ________

                Community Meeting Calendar:

                Schedule -

                Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC

                Bridge: https://meet.google.com/cpu-eiue-hvk

                Gluster-users mailing list

                Gluster-users@xxxxxxxxxxx

                https://lists.gluster.org/mailman/listinfo/gluster-users

    -- 
forumZFD
Entschieden für Frieden | Committed to Peace

Ilias Chasapakis
Referent IT | IT Consultant

Forum Ziviler Friedensdienst e.V. | Forum Civil Peace Service
Am Kölner Brett 8 | 50825 Köln | Germany

Tel 0221 91273243 | Fax 0221 91273299 | http://www.forumZFD.de

Vorstand nach § 26 BGB, einzelvertretungsberechtigt|Executive Board:
Alexander Mauz, Sonja Wiekenberg-Mlalandle, Jens von Bargen
VR 17651 Amtsgericht Köln

Spenden|Donations: IBAN DE90 4306 0967 4103 7264 00   BIC GENODEM1GLS

________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users