question on rebalance errors gluster 7.2 (adding to distributed/replicated)

Erik Jacobson <erik.jacobson@xxxxxxx> · Mon, 10 Feb 2020 18:46:13 -0600

My question: Are the errors and anomalies below something I need to
investigate? Are should I not be worried?

I installed a test cluster to gluster 7.2 to run some tests, preparing
to see if we gain confidence to put this on the 5,120 node
supercomputer instead of gluster 4.1.6.

I started with a 3x2 volume with heavy optimizations for writes and NFS.
(6 nodes, distribute/replicate).

I booted my NFS-root clients and maintained them online.

I then performaned a add-brick operation to make it a 3x3 instead of
3.2 (so 9 servers instead of 6).

The rebalance went much better for me than gluster 4.1.6. However, I saw
some errors. We noted them first here -- 14 errors on leader8, and a few
on the others. These are the NEW nodes so the data flow was from the old
nodes to these three that at least have one error:

[root@leader8 glusterfs]# gluster volume rebalance cm_shared status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
  leader1.head.cm.eag.rdlabs.hpecorp.net            18933       596.4MB        181780             0          3760            completed        0:41:39
                              172.23.0.4            18960         1.2GB        181831             0          3766            completed        0:41:39
                              172.23.0.5            18691         1.2GB        181826             0          3716            completed        0:41:39
                              172.23.0.6            14917       618.8MB        175758             0          3869            completed        0:35:40
                              172.23.0.7            15114       573.5MB        175728             0          3853            completed        0:35:41
                              172.23.0.8            14864       459.2MB        175742             0          3951            completed        0:35:40
                              172.23.0.9                0        0Bytes            11             3             0            completed        0:08:26
                             172.23.0.11                0        0Bytes           242             1             0            completed        0:08:25
                               localhost                0        0Bytes             5            14             0            completed        0:08:26
volume rebalance: cm_shared: success

My rebalance log is like 32M and I find it's hard for people to help me
when I post that much data. So I've tried to filter some of the data
here. Two classes -- anomalies and errors.

Errors (14 reported on this node):

[root@leader8 glusterfs]# grep -i "error from gf_defrag_get_entry" cm_shared-rebalance.log
[2020-02-10 23:23:55.286830] W [dht-rebalance.c:3439:gf_defrag_process_dir] 0-cm_shared-dht: Found error from gf_defrag_get_entry
[2020-02-10 23:24:12.903496] W [dht-rebalance.c:3439:gf_defrag_process_dir] 0-cm_shared-dht: Found error from gf_defrag_get_entry
[2020-02-10 23:24:15.226948] W [dht-rebalance.c:3439:gf_defrag_process_dir] 0-cm_shared-dht: Found error from gf_defrag_get_entry
[2020-02-10 23:24:15.259480] W [dht-rebalance.c:3439:gf_defrag_process_dir] 0-cm_shared-dht: Found error from gf_defrag_get_entry
[2020-02-10 23:24:15.398784] W [dht-rebalance.c:3439:gf_defrag_process_dir] 0-cm_shared-dht: Found error from gf_defrag_get_entry
[2020-02-10 23:24:16.633033] W [dht-rebalance.c:3439:gf_defrag_process_dir] 0-cm_shared-dht: Found error from gf_defrag_get_entry
[2020-02-10 23:24:16.645847] W [dht-rebalance.c:3439:gf_defrag_process_dir] 0-cm_shared-dht: Found error from gf_defrag_get_entry
[2020-02-10 23:24:21.783528] W [dht-rebalance.c:3439:gf_defrag_process_dir] 0-cm_shared-dht: Found error from gf_defrag_get_entry
[2020-02-10 23:24:22.307464] W [dht-rebalance.c:3439:gf_defrag_process_dir] 0-cm_shared-dht: Found error from gf_defrag_get_entry
[2020-02-10 23:25:23.391256] W [dht-rebalance.c:3439:gf_defrag_process_dir] 0-cm_shared-dht: Found error from gf_defrag_get_entry
[2020-02-10 23:26:34.203129] W [dht-rebalance.c:3439:gf_defrag_process_dir] 0-cm_shared-dht: Found error from gf_defrag_get_entry
[2020-02-10 23:26:39.669243] W [dht-rebalance.c:3439:gf_defrag_process_dir] 0-cm_shared-dht: Found error from gf_defrag_get_entry
[2020-02-10 23:27:42.615081] W [dht-rebalance.c:3439:gf_defrag_process_dir] 0-cm_shared-dht: Found error from gf_defrag_get_entry
[2020-02-10 23:28:53.942158] W [dht-rebalance.c:3439:gf_defrag_process_dir] 0-cm_shared-dht: Found error from gf_defrag_get_entry

Brick log errors around 23:23:55 (to match the first error above):

[2020-02-10 23:23:54.605681] W [MSGID: 113096] [posix-handle.c:834:posix_handle_soft] 0-cm_shared-posix: symlink ../../a4/3e/a43ef7fd-08eb-434c-8168-96a92059d186/LC_MESSAGES -> /data/brick_cm_shared/.glusterfs/10/d9/10d97106-49b1-4c5e-a86f-b8e70c9ef838 failed [File exists]
[2020-02-10 23:23:54.883387] W [MSGID: 113096] [posix-handle.c:834:posix_handle_soft] 0-cm_shared-posix: symlink ../../7d/66/7d66930c-3bd0-40c8-9473-897fcd2f8c11/LC_MESSAGES -> /data/brick_cm_shared/.glusterfs/7c/41/7c412877-2443-43a8-9c7a-67ada4d96a13 failed [File exists]
[2020-02-10 23:23:55.284155] W [MSGID: 113096] [posix-handle.c:834:posix_handle_soft] 0-cm_shared-posix: symlink ../../a0/2c/a02c8b2d-f587-4c58-9de9-7928828e37e5/LC_MESSAGES -> /data/brick_cm_shared/.glusterfs/eb/79/eb79298d-a65e-41f3-a9a8-da4634879e88 failed [File exists]
[2020-02-10 23:23:55.284178] E [MSGID: 113020] [posix-entry-ops.c:835:posix_mkdir] 0-cm_shared-posix: setting gfid on /data/brick_cm_shared/image/images_ro_nfs/rhel8.0/usr/share/vim/vim80/lang/zh_CN.UTF-8/LC_MESSAGES failed [File exists]
[2020-02-10 23:23:55.284913] W [MSGID: 113103] [posix-entry-ops.c:247:posix_lookup] 0-cm_shared-posix: Found stale gfid handle /data/brick_cm_shared/.glusterfs/eb/79/eb79298d-a65e-41f3-a9a8-da4634879e88, removing it. [No such file or directory]
[2020-02-10 23:23:57.218664] W [MSGID: 113096] [posix-handle.c:834:posix_handle_soft] 0-cm_shared-posix: symlink ../../86/c2/86c2e694-d00b-4dcf-8383-60ce0cb07275/html -> /data/brick_cm_shared/.glusterfs/5c/f0/5cf0cc7d-86fe-4ba2-bea5-1d8ad3616274 failed [File exists]

Example anomalies - normal root files:

[2020-02-10 23:28:18.816012] I [MSGID: 109063] [dht-layout.c:647:dht_layout_normalize] 0-cm_shared-dht: Found anomalies in /image/images_dist/rhel8.0/usr/lib64/python3.6/email (gfid = 4194dca6-dcc9-409b-a162-58e90b8db63d). Holes=1 overlaps=0
[2020-02-10 23:28:18.822869] I [MSGID: 109063] [dht-layout.c:647:dht_layout_normalize] 0-cm_shared-dht: Found anomalies in /image/images_dist/rhel8.0/usr/lib64/python3.6/email/__pycache__ (gfid = 07e4e462-de25-4840-99dc-f4235b4b45bf). Holes=1 overlaps=0
[2020-02-10 23:28:18.834924] I [MSGID: 109063] [dht-layout.c:647:dht_layout_normalize] 0-cm_shared-dht: Found anomalies in /image/images_dist/rhel8.0/usr/lib64/python3.6/email/mime (gfid = f882e53c-43c6-48ea-9230-c0bc7eee901f). Holes=1 overlaps=0
...

Example anomalies - sparse files with XFS images used as node-writable space:
(but these are just the directories that hold the sparse files, not the
spars files themselves)

[2020-02-10 23:26:07.231529] I [MSGID: 109063] [dht-layout.c:647:dht_layout_normalize] 0-cm_shared-dht: Found anomalies in /image/images_rw_nfs/n2521 (gfid = 3b65777c-5fc5-4213-9525-294e74a560ca). Holes=1 overlaps=0
[2020-02-10 23:26:07.237923] I [MSGID: 109063] [dht-layout.c:647:dht_layout_normalize] 0-cm_shared-dht: Found anomalies in /image/images_rw_nfs/n2521/rhel8.0-aarch64 (gfid = f822683d-7136-4d5c-8df5-94f1b84afc03). Holes=1 overlaps=0

Volume info:

[root@leader8 glusterfs]# gluster volume status cm_shared
Status of volume: cm_shared
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 172.23.0.3:/data/brick_cm_shared      49152     0          Y       36543
Brick 172.23.0.4:/data/brick_cm_shared      49152     0          Y       34371
Brick 172.23.0.5:/data/brick_cm_shared      49152     0          Y       34451
Brick 172.23.0.6:/data/brick_cm_shared      49152     0          Y       35685
Brick 172.23.0.7:/data/brick_cm_shared      49152     0          Y       34068
Brick 172.23.0.8:/data/brick_cm_shared      49152     0          Y       35093
Brick 172.23.0.9:/data/brick_cm_shared      49154     0          Y       31940
Brick 172.23.0.10:/data/brick_cm_shared     49154     0          Y       32420
Brick 172.23.0.11:/data/brick_cm_shared     49154     0          Y       32906
Self-heal Daemon on localhost               N/A       N/A        Y       32063
NFS Server on localhost                     2049      0          Y       32493
Self-heal Daemon on 172.23.0.4              N/A       N/A        Y       34435
NFS Server on 172.23.0.4                    2049      0          Y       9636
Self-heal Daemon on 172.23.0.5              N/A       N/A        Y       34514
NFS Server on 172.23.0.5                    2049      0          Y       11483
Self-heal Daemon on 172.23.0.7              N/A       N/A        Y       34131
NFS Server on 172.23.0.7                    2049      0          Y       12294
Self-heal Daemon on 172.23.0.6              N/A       N/A        Y       35752
NFS Server on 172.23.0.6                    2049      0          Y       4699
Self-heal Daemon on leader1.head.cm.eag.rdl
abs.hpecorp.net                             N/A       N/A        Y       36626
NFS Server on leader1.head.cm.eag.rdlabs.hp
ecorp.net                                   2049      0          Y       8736
Self-heal Daemon on 172.23.0.9              N/A       N/A        Y       31583
NFS Server on 172.23.0.9                    2049      0          Y       31996
Self-heal Daemon on 172.23.0.11             N/A       N/A        Y       32550
NFS Server on 172.23.0.11                   2049      0          Y       32962
Self-heal Daemon on 172.23.0.8              N/A       N/A        Y       35160
NFS Server on 172.23.0.8                    2049      0          Y       2250

Task Status of Volume cm_shared
------------------------------------------------------------------------------
Task                 : Rebalance
ID                   : f42c98ad-801a-4376-94ea-7dff698f8241
Status               : completed

Commands used to grow:

ssh leader1 gluster volume add-brick cm_shared 172.23.0.9://data/brick_cm_shared 172.23.0.10://data/brick_cm_shared 172.23.0.11://data/brick_cm_shared
volume add-brick: success

ssh leader1 gluster volume rebalance cm_shared start
volume rebalance: cm_shared: success: Rebalance on cm_shared has been started successfully. Use rebalance status command to check status of the rebalance process.

All volume data/settings:

[root@leader8 glusterfs]# gluster volume get cm_shared all
Option                                  Value
------                                  -----
cluster.lookup-unhashed                 auto
cluster.lookup-optimize                 on
cluster.min-free-disk                   10%
cluster.min-free-inodes                 5%
cluster.rebalance-stats                 off
cluster.subvols-per-directory           (null)
cluster.readdir-optimize                off
cluster.rsync-hash-regex                (null)
cluster.extra-hash-regex                (null)
cluster.dht-xattr-name                  trusted.glusterfs.dht
cluster.randomize-hash-range-by-gfid    off
cluster.rebal-throttle                  normal
cluster.lock-migration                  off
cluster.force-migration                 off
cluster.local-volume-name               (null)
cluster.weighted-rebalance              on
cluster.switch-pattern                  (null)
cluster.entry-change-log                on
cluster.read-subvolume                  (null)
cluster.read-subvolume-index            -1
cluster.read-hash-mode                  1
cluster.background-self-heal-count      8
cluster.metadata-self-heal              off
cluster.data-self-heal                  off
cluster.entry-self-heal                 off
cluster.self-heal-daemon                on
cluster.heal-timeout                    600
cluster.self-heal-window-size           1
cluster.data-change-log                 on
cluster.metadata-change-log             on
cluster.data-self-heal-algorithm        (null)
cluster.eager-lock                      on
disperse.eager-lock                     on
disperse.other-eager-lock               on
disperse.eager-lock-timeout             1
disperse.other-eager-lock-timeout       1
cluster.quorum-type                     auto
cluster.quorum-count                    (null)
cluster.choose-local                    true
cluster.self-heal-readdir-size          1KB
cluster.post-op-delay-secs              1
cluster.ensure-durability               on
cluster.consistent-metadata             no
cluster.heal-wait-queue-length          128
cluster.favorite-child-policy           none
cluster.full-lock                       yes
cluster.optimistic-change-log           on
diagnostics.latency-measurement         off
diagnostics.dump-fd-stats               off
diagnostics.count-fop-hits              off
diagnostics.brick-log-level             INFO
diagnostics.client-log-level            INFO
diagnostics.brick-sys-log-level         CRITICAL
diagnostics.client-sys-log-level        CRITICAL
diagnostics.brick-logger                (null)
diagnostics.client-logger               (null)
diagnostics.brick-log-format            (null)
diagnostics.client-log-format           (null)
diagnostics.brick-log-buf-size          5
diagnostics.client-log-buf-size         5
diagnostics.brick-log-flush-timeout     120
diagnostics.client-log-flush-timeout    120
diagnostics.stats-dump-interval         0
diagnostics.fop-sample-interval         0
diagnostics.stats-dump-format           json
diagnostics.fop-sample-buf-size         65535
diagnostics.stats-dnscache-ttl-sec      86400
performance.cache-max-file-size         0
performance.cache-min-file-size         0
performance.cache-refresh-timeout       60
performance.cache-priority
performance.cache-size                  8GB
performance.io-thread-count             32
performance.high-prio-threads           16
performance.normal-prio-threads         16
performance.low-prio-threads            16
performance.least-prio-threads          1
performance.enable-least-priority       on
performance.iot-watchdog-secs           (null)
performance.iot-cleanup-disconnected-reqsoff
performance.iot-pass-through            false
performance.io-cache-pass-through       false
performance.cache-size                  8GB
performance.qr-cache-timeout            1
performance.cache-invalidation          on
performance.ctime-invalidation          false
performance.flush-behind                on
performance.nfs.flush-behind            on
performance.write-behind-window-size    1024MB
performance.resync-failed-syncs-after-fsyncoff
performance.nfs.write-behind-window-size1MB
performance.strict-o-direct             off
performance.nfs.strict-o-direct         off
performance.strict-write-ordering       off
performance.nfs.strict-write-ordering   off
performance.write-behind-trickling-writesoff
performance.aggregate-size              2048KB
performance.nfs.write-behind-trickling-writeson
performance.lazy-open                   yes
performance.read-after-open             yes
performance.open-behind-pass-through    false
performance.read-ahead-page-count       4
performance.read-ahead-pass-through     false
performance.readdir-ahead-pass-through  false
performance.md-cache-pass-through       false
performance.md-cache-timeout            600
performance.cache-swift-metadata        true
performance.cache-samba-metadata        false
performance.cache-capability-xattrs     true
performance.cache-ima-xattrs            true
performance.md-cache-statfs             off
performance.xattr-cache-list
performance.nl-cache-pass-through       false
network.frame-timeout                   1800
network.ping-timeout                    42
network.tcp-window-size                 (null)
client.ssl                              off
network.remote-dio                      disable
client.event-threads                    32
client.tcp-user-timeout                 0
client.keepalive-time                   20
client.keepalive-interval               2
client.keepalive-count                  9
network.tcp-window-size                 (null)
network.inode-lru-limit                 1000000
auth.allow                              *
auth.reject                             (null)
transport.keepalive                     1
server.allow-insecure                   on
server.root-squash                      off
server.all-squash                       off
server.anonuid                          65534
server.anongid                          65534
server.statedump-path                   /var/run/gluster
server.outstanding-rpc-limit            1024
server.ssl                              off
auth.ssl-allow                          *
server.manage-gids                      off
server.dynamic-auth                     on
client.send-gids                        on
server.gid-timeout                      300
server.own-thread                       (null)
server.event-threads                    32
server.tcp-user-timeout                 42
server.keepalive-time                   20
server.keepalive-interval               2
server.keepalive-count                  9
transport.listen-backlog                16384
transport.address-family                inet
performance.write-behind                on
performance.read-ahead                  on
performance.readdir-ahead               on
performance.io-cache                    on
performance.open-behind                 on
performance.quick-read                  on
performance.nl-cache                    off
performance.stat-prefetch               on
performance.client-io-threads           on
performance.nfs.write-behind            on
performance.nfs.read-ahead              off
performance.nfs.io-cache                on
performance.nfs.quick-read              off
performance.nfs.stat-prefetch           off
performance.nfs.io-threads              off
performance.force-readdirp              true
performance.cache-invalidation          on
performance.global-cache-invalidation   true
features.uss                            off
features.snapshot-directory             .snaps
features.show-snapshot-directory        off
features.tag-namespaces                 off
network.compression                     off
network.compression.window-size         -15
network.compression.mem-level           8
network.compression.min-size            0
network.compression.compression-level   -1
network.compression.debug               false
features.default-soft-limit             80%
features.soft-timeout                   60
features.hard-timeout                   5
features.alert-time                     86400
features.quota-deem-statfs              off
geo-replication.indexing                off
geo-replication.indexing                off
geo-replication.ignore-pid-check        off
geo-replication.ignore-pid-check        off
features.quota                          off
features.inode-quota                    off
features.bitrot                         disable
debug.trace                             off
debug.log-history                       no
debug.log-file                          no
debug.exclude-ops                       (null)
debug.include-ops                       (null)
debug.error-gen                         off
debug.error-failure                     (null)
debug.error-number                      (null)
debug.random-failure                    off
debug.error-fops                        (null)
nfs.enable-ino32                        no
nfs.mem-factor                          15
nfs.export-dirs                         on
nfs.export-volumes                      on
nfs.addr-namelookup                     off
nfs.dynamic-volumes                     off
nfs.register-with-portmap               on
nfs.outstanding-rpc-limit               1024
nfs.port                                2049
nfs.rpc-auth-unix                       on
nfs.rpc-auth-null                       on
nfs.rpc-auth-allow                      all
nfs.rpc-auth-reject                     none
nfs.ports-insecure                      off
nfs.trusted-sync                        off
nfs.trusted-write                       off
nfs.volume-access                       read-write
nfs.export-dir
nfs.disable                             off
nfs.nlm                                 off
nfs.acl                                 on
nfs.mount-udp                           off
nfs.mount-rmtab                         /-
nfs.rpc-statd                           /sbin/rpc.statd
nfs.server-aux-gids                     off
nfs.drc                                 off
nfs.drc-size                            0x20000
nfs.read-size                           (1 * 1048576ULL)
nfs.write-size                          (1 * 1048576ULL)
nfs.readdir-size                        (1 * 1048576ULL)
nfs.rdirplus                            on
nfs.event-threads                       2
nfs.exports-auth-enable                 on
nfs.auth-refresh-interval-sec           360
nfs.auth-cache-ttl-sec                  360
features.read-only                      off
features.worm                           off
features.worm-file-level                off
features.worm-files-deletable           on
features.default-retention-period       120
features.retention-mode                 relax
features.auto-commit-period             180
storage.linux-aio                       off
storage.batch-fsync-mode                reverse-fsync
storage.batch-fsync-delay-usec          0
storage.owner-uid                       -1
storage.owner-gid                       -1
storage.node-uuid-pathinfo              off
storage.health-check-interval           30
storage.build-pgfid                     off
storage.gfid2path                       on
storage.gfid2path-separator             :
storage.reserve                         1
storage.reserve-size                    0
storage.health-check-timeout            10
storage.fips-mode-rchecksum             on
storage.force-create-mode               0000
storage.force-directory-mode            0000
storage.create-mask                     0777
storage.create-directory-mask           0777
storage.max-hardlinks                   0
features.ctime                          on
config.gfproxyd                         off
cluster.server-quorum-type              off
cluster.server-quorum-ratio             51
changelog.changelog                     off
changelog.changelog-dir                 {{ brick.path }}/.glusterfs/changelogs
changelog.encoding                      ascii
changelog.rollover-time                 15
changelog.fsync-interval                5
changelog.changelog-barrier-timeout     120
changelog.capture-del-path              off
features.barrier                        disable
features.barrier-timeout                120
features.trash                          off
features.trash-dir                      .trashcan
features.trash-eliminate-path           (null)
features.trash-max-filesize             5MB
features.trash-internal-op              off
cluster.enable-shared-storage           disable
locks.trace                             off
locks.mandatory-locking                 off
cluster.disperse-self-heal-daemon       enable
cluster.quorum-reads                    no
client.bind-insecure                    (null)
features.shard                          off
features.shard-block-size               64MB
features.shard-lru-limit                16384
features.shard-deletion-rate            100
features.scrub-throttle                 lazy
features.scrub-freq                     biweekly
features.scrub                          false
features.expiry-time                    120
features.cache-invalidation             on
features.cache-invalidation-timeout     600
features.leases                         off
features.lease-lock-recall-timeout      60
disperse.background-heals               8
disperse.heal-wait-qlength              128
cluster.heal-timeout                    600
dht.force-readdirp                      on
disperse.read-policy                    gfid-hash
cluster.shd-max-threads                 1
cluster.shd-wait-qlength                1024
cluster.locking-scheme                  full
cluster.granular-entry-heal             no
features.locks-revocation-secs          0
features.locks-revocation-clear-all     false
features.locks-revocation-max-blocked   0
features.locks-monkey-unlocking         false
features.locks-notify-contention        no
features.locks-notify-contention-delay  5
disperse.shd-max-threads                1
disperse.shd-wait-qlength               1024
disperse.cpu-extensions                 auto
disperse.self-heal-window-size          1
cluster.use-compound-fops               off
performance.parallel-readdir            on
performance.rda-request-size            131072
performance.rda-low-wmark               4096
performance.rda-high-wmark              128KB
performance.rda-cache-limit             10MB
performance.nl-cache-positive-entry     false
performance.nl-cache-limit              10MB
performance.nl-cache-timeout            60
cluster.brick-multiplex                 disable
glusterd.vol_count_per_thread           100
cluster.max-bricks-per-process          250
disperse.optimistic-change-log          on
disperse.stripe-cache                   4
cluster.halo-enabled                    False
cluster.halo-shd-max-latency            99999
cluster.halo-nfsd-max-latency           5
cluster.halo-max-latency                5
cluster.halo-max-replicas               99999
cluster.halo-min-replicas               2
features.selinux                        on
cluster.daemon-log-level                INFO
debug.delay-gen                         off
delay-gen.delay-percentage              10%
delay-gen.delay-duration                100000
delay-gen.enable
disperse.parallel-writes                on
features.sdfs                           off
features.cloudsync                      off
features.ctime                          on
ctime.noatime                           on
features.cloudsync-storetype            (null)
features.enforce-mandatory-lock         off
config.global-threading                 off
config.client-threads                   16
config.brick-threads                    16
features.cloudsync-remote-read          off
features.cloudsync-store-id             (null)
features.cloudsync-product-id           (null)
________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/441850968

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users