GlusterFS CPU high on one node - SETATTR Operation not permitted

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

I have a 3 node gluster cluster:
glusterfs 3.7.12 built on Jun 27 2016 12:40:53
3 x AWS EC2 m4.xlarge  1 x 1.5Tb STI EBS

We have had several issues where GlusterFS FUSE client hosts get into 'D' on reading the file system and now have one of the 3 nodes with substantially more CPU load than the other two.

This morning (several hours ago) we shut down one node exhibiting the high CPU behaviour and now that node is back to normal and one of the other 3 has high CPU.

The brick logs on both the 'good' nodes are emitting this over and over.  

[2016-10-30 20:55:48.483045] I [MSGID: 115072] [server-rpc-fops.c:1786:server_setattr_cbk] 0-marketplace_nfs-server: 676120: SETATTR /ftpdata/[REDACTED]/bulk_import/.batches/fa44bf76-5706-40fc-b068-33e11a22bdd9/source (f4569925-6c13-4e51-8d97-38cf6c0b198a) ==> (Operation not permitted) [Operation not permitted]
[2016-10-30 20:55:51.511782] I [MSGID: 115072] [server-rpc-fops.c:1786:server_setattr_cbk] 0-marketplace_nfs-server: 695527: SETATTR /ftpdata/[REDACTED]/bulk_import/.batches/fa44bf76-5706-40fc-b068-33e11a22bdd9/source (f4569925-6c13-4e51-8d97-38cf6c0b198a) ==> (Operation not permitted) [Operation not permitted]
[2016-10-30 20:55:51.553858] I [MSGID: 115072] [server-rpc-fops.c:1786:server_setattr_cbk] 0-marketplace_nfs-server: 708963: SETATTR /ftpdata/[REDACTED]/bulk_import/.batches/c42cc627-adc6-43a4-93dd-ea6ec0eaa9cb/source (eaceb846-9ee1-4fab-8f4d-e2a93e85710f) ==> (Operation not permitted) [Operation not permitted]
[2016-10-30 20:55:54.660251] I [MSGID: 115072] [server-rpc-fops.c:1786:server_setattr_cbk] 0-marketplace_nfs-server: 709004: SETATTR /ftpdata/[REDACTED]/bulk_import/.batches/fa44bf76-5706-40fc-b068-33e11a22bdd9/source (f4569925-6c13-4e51-8d97-38cf6c0b198a) ==> (Operation not permitted) [Operation not permitted]
[2016-10-30 20:55:54.866259] I [MSGID: 115072] [server-rpc-fops.c:1786:server_setattr_cbk] 0-marketplace_nfs-server: 356367: SETATTR /ftpdata/[REDACTED]/bulk_import (d7dda9df-bf56-4f71-b6b8-59f2ccc41016) ==> (Operation not permitted) [Operation not permitted]

There are loads of files that are in the gluster volume heal <vol> info. 

Client nodes using GlusterFS FUSE clients have been hanging in an uninterruptible state causing issues.  I can see files are missing on some of the GlusterFS bricks and I have used a client mounting the bricks via NFS and doing a file traverse and stat on each file and directory (find /<mountpoints> -type f -exec stat {} \;) and I can see files being healed. 

We are hoping that this will get us back into a normal state but we are unsure if there is a larger problem looming.

I have included the statedump and current setting:


Option                                  Value
------                                  -----
cluster.lookup-unhashed                 on
cluster.lookup-optimize                 on
cluster.min-free-disk                   10%
cluster.min-free-inodes                 5%
cluster.rebalance-stats                 off
cluster.subvols-per-directory           (null)
cluster.readdir-optimize                off
cluster.rsync-hash-regex                (null)
cluster.extra-hash-regex                (null)
cluster.dht-xattr-name                  trusted.glusterfs.dht
cluster.randomize-hash-range-by-gfid    off
cluster.rebal-throttle                  normal
cluster.local-volume-name               (null)
cluster.weighted-rebalance              on
cluster.switch-pattern                  (null)
cluster.entry-change-log                on
cluster.read-subvolume                  (null)
cluster.read-subvolume-index            -1
cluster.read-hash-mode                  1
cluster.background-self-heal-count      8
cluster.metadata-self-heal              on
cluster.data-self-heal                  on
cluster.entry-self-heal                 on
cluster.self-heal-daemon                disable
cluster.heal-timeout                    600
cluster.self-heal-window-size           1
cluster.data-change-log                 on
cluster.metadata-change-log             on
cluster.data-self-heal-algorithm        full
cluster.eager-lock                      on
disperse.eager-lock                     on
cluster.quorum-type                     none
cluster.quorum-count                    (null)
cluster.choose-local                    true
cluster.self-heal-readdir-size          1KB
cluster.post-op-delay-secs              1
cluster.ensure-durability               on
cluster.consistent-metadata             no
cluster.heal-wait-queue-length          128
cluster.stripe-block-size               128KB
cluster.stripe-coalesce                 true
diagnostics.latency-measurement         off
diagnostics.dump-fd-stats               off
diagnostics.count-fop-hits              off
diagnostics.brick-log-level             INFO
diagnostics.client-log-level            INFO
diagnostics.brick-sys-log-level         CRITICAL
diagnostics.client-sys-log-level        CRITICAL
diagnostics.brick-logger                (null)
diagnostics.client-logger               (null)
diagnostics.brick-log-format            (null)
diagnostics.client-log-format           (null)
diagnostics.brick-log-buf-size          5
diagnostics.client-log-buf-size         5
diagnostics.brick-log-flush-timeout     120
diagnostics.client-log-flush-timeout    120
performance.cache-max-file-size         0
performance.cache-min-file-size         0
performance.cache-refresh-timeout       1
performance.cache-priority
performance.cache-size                  512MB
performance.io-thread-count             16
performance.high-prio-threads           16
performance.normal-prio-threads         16
performance.low-prio-threads            16
performance.least-prio-threads          1
performance.enable-least-priority       on
performance.least-rate-limit            0
performance.cache-size                  512MB
performance.flush-behind                on
performance.nfs.flush-behind            on
performance.write-behind-window-size    1MB
performance.resync-failed-syncs-after-fsyncoff
performance.nfs.write-behind-window-size1MB
performance.strict-o-direct             off
performance.nfs.strict-o-direct         off
performance.strict-write-ordering       off
performance.nfs.strict-write-ordering   off
performance.lazy-open                   yes
performance.read-after-open             no
performance.read-ahead-page-count       4
performance.md-cache-timeout            1
performance.cache-swift-metadata        true
features.encryption                     off
encryption.master-key                   (null)
encryption.data-key-size                256
encryption.block-size                   4096
network.frame-timeout                   1800
network.ping-timeout                    15
network.tcp-window-size                 (null)
features.lock-heal                      off
features.grace-timeout                  10
network.remote-dio                      disable
client.event-threads                    2
network.ping-timeout                    15
network.tcp-window-size                 (null)
network.inode-lru-limit                 16384
auth.allow                              *
auth.reject                             (null)
transport.keepalive                     (null)
server.allow-insecure                   (null)
server.root-squash                      off
server.anonuid                          65534
server.anongid                          65534
server.statedump-path                   /var/run/gluster
server.outstanding-rpc-limit            64
features.lock-heal                      off
features.grace-timeout                  (null)
server.ssl                              (null)
auth.ssl-allow                          *
server.manage-gids                      off
server.dynamic-auth                     on
client.send-gids                        on
server.gid-timeout                      300
server.own-thread                       (null)
server.event-threads                    2
ssl.own-cert                            (null)
ssl.private-key                         (null)
ssl.ca-list                             (null)
ssl.crl-path                            (null)
ssl.certificate-depth                   (null)
ssl.cipher-list                         (null)
ssl.dh-param                            (null)
ssl.ec-curve                            (null)
performance.write-behind                on
performance.read-ahead                  on
performance.readdir-ahead               on
performance.io-cache                    on
performance.quick-read                  on
performance.open-behind                 on
performance.stat-prefetch               on
performance.client-io-threads           off
performance.nfs.write-behind            on
performance.nfs.read-ahead              off
performance.nfs.io-cache                off
performance.nfs.quick-read              off
performance.nfs.stat-prefetch           off
performance.nfs.io-threads              off
performance.force-readdirp              true
features.file-snapshot                  off
features.uss                            off
features.snapshot-directory             .snaps
features.show-snapshot-directory        off
network.compression                     off
network.compression.window-size         -15
network.compression.mem-level           8
network.compression.min-size            0
network.compression.compression-level   -1
network.compression.debug               false
features.limit-usage                    (null)
features.quota-timeout                  0
features.default-soft-limit             80%
features.soft-timeout                   60
features.hard-timeout                   5
features.alert-time                     86400
features.quota-deem-statfs              off
geo-replication.indexing                off
geo-replication.indexing                off
geo-replication.ignore-pid-check        off
geo-replication.ignore-pid-check        off
features.quota                          off
features.inode-quota                    off
features.bitrot                         disable
debug.trace                             off
debug.log-history                       no
debug.log-file                          no
debug.exclude-ops                       (null)
debug.include-ops                       (null)
debug.error-gen                         off
debug.error-failure                     (null)
debug.error-number                      (null)
debug.random-failure                    off
debug.error-fops                        (null)
nfs.enable-ino32                        no
nfs.mem-factor                          15
nfs.export-dirs                         on
nfs.export-volumes                      on
nfs.addr-namelookup                     off
nfs.dynamic-volumes                     off
nfs.register-with-portmap               on
nfs.outstanding-rpc-limit               16
nfs.port                                2049
nfs.rpc-auth-unix                       on
nfs.rpc-auth-null                       on
nfs.rpc-auth-allow                      all
nfs.rpc-auth-reject                     none
nfs.ports-insecure                      off
nfs.trusted-sync                        off
nfs.trusted-write                       off
nfs.volume-access                       read-write
nfs.export-dir
nfs.disable                             false
nfs.nlm                                 on
nfs.acl                                 on
nfs.mount-udp                           off
nfs.mount-rmtab                         /var/lib/glusterd/nfs/rmtab
nfs.rpc-statd                           /sbin/rpc.statd
nfs.server-aux-gids                     off
nfs.drc                                 off
nfs.drc-size                            0x20000
nfs.read-size                           (1 * 1048576ULL)
nfs.write-size                          (1 * 1048576ULL)
nfs.readdir-size                        (1 * 1048576ULL)
nfs.rdirplus                            on
nfs.exports-auth-enable                 (null)
nfs.auth-refresh-interval-sec           (null)
nfs.auth-cache-ttl-sec                  (null)
features.read-only                      off
features.worm                           off
storage.linux-aio                       off
storage.batch-fsync-mode                reverse-fsync
storage.batch-fsync-delay-usec          0
storage.owner-uid                       -1
storage.owner-gid                       -1
storage.node-uuid-pathinfo              off
storage.health-check-interval           30
storage.build-pgfid                     off
storage.bd-aio                          off
cluster.server-quorum-type              off
cluster.server-quorum-ratio             51%
changelog.changelog                     off
changelog.changelog-dir                 (null)
changelog.encoding                      ascii
changelog.rollover-time                 15
changelog.fsync-interval                5
changelog.changelog-barrier-timeout     120
changelog.capture-del-path              off
features.barrier                        disable
features.barrier-timeout                120
features.trash                          off
features.trash-dir                      .trashcan
features.trash-eliminate-path           (null)
features.trash-max-filesize             5MB
features.trash-internal-op              off
cluster.enable-shared-storage           disable
cluster.write-freq-threshold            0
cluster.read-freq-threshold             0
cluster.tier-pause                      off
cluster.tier-promote-frequency          120
cluster.tier-demote-frequency           3600
cluster.watermark-hi                    90
cluster.watermark-low                   75
cluster.tier-mode                       cache
cluster.tier-max-mb                     4000
cluster.tier-max-files                  10000
features.ctr-enabled                    off
features.record-counters                off
features.ctr-record-metadata-heat       off
features.ctr_link_consistency           off
features.ctr_lookupheal_link_timeout    300
features.ctr_lookupheal_inode_timeout   300
features.ctr-sql-db-cachesize           1000
features.ctr-sql-db-wal-autocheckpoint  1000
locks.trace                             (null)
cluster.disperse-self-heal-daemon       enable
cluster.quorum-reads                    no
client.bind-insecure                    (null)
ganesha.enable                          off
features.shard                          off
features.shard-block-size               4MB
features.scrub-throttle                 lazy
features.scrub-freq                     biweekly
features.scrub                          false
features.expiry-time                    120
features.cache-invalidation             off
features.cache-invalidation-timeout     60
disperse.background-heals               8
disperse.heal-wait-qlength              128
dht.force-readdirp                      on
disperse.read-policy                    round-robin
cluster.shd-max-threads                 1
cluster.shd-wait-qlength                1024
cluster.locking-scheme                  full 

Help is appreciated

Den




Attachment: data-data0-marketplace_nfs.4868.dump.1477862666.gz
Description: GNU Zip compressed data

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux