Howdy all, Sorry in Australia so most of your replies came in over night for me. Note: At the end of this reply is a listing of all our volume settings (gluster get volname all). Note 2: I really wish Gluster used Discourse for this kind of community troubleshooting an analysis, using a mailing list is really painful.
I've actually set this right up to 60 (seconds), I guess it's possible that's causing an issue but I thought that was more for forced eviction on idle files.
Hmm yes I wonder if it might be worth looking at the stripe-block-size, I forgot about this as it sounds like it's for striped volumes (now deprecated) only. The issue with this is that I don't want to tune the volume just for small files and hurt the performance of lager I/O operations.
GlusterFS 3.7 is really old so I'd be careful looking at settings / tuning for it.
Not using NFS.
Already set to 1024MB, but that's only for reads not writes.
That's my current setting.
Currently allowing even more cache up at 256MB.
That's my current setting (the default now I believe).
Not sure how that's going to impact small I/O performance. I currently have this set to none, but do use an arbiter node.
Not sure how that's going to impact small I/O performance. I currently have this set to off, but do use an arbiter node.
Not sure how that's going to impact small I/O performance. I currently have this set to 0, but do use an arbiter node.
That's my current setting.
I've set this right down to 5.
That's my current setting.
That's my current setting, we don't have swap - other than ZRAM enabled on any hosts.
N/A as swap disabled (ZRAM only)
I have this set to vm.min_free_kbytes = 67584, I'd be worried that setting this high would cause OOM as per the official kernel docs: min_free_kbytes: This is used to force the Linux VM to keep a minimum number of kilobytes free. The VM uses this number to compute a watermark[WMARK_MIN] value for each lowmem zone in the system. Each lowmem zone gets a number of reserved free pages based proportionally on its size. Some minimal amount of memory is needed to satisfy PF_MEMALLOC allocations; if you set this to lower than 1024KB, your system will become subtly broken, and prone to deadlock under high loads. Setting this too high will OOM your machine instantly.
DD is not going to give anyone particularly useful benchmarks, especially with small file sizes, in fact it's more likely to mislead you than be useful. See my short post on fio here: https://smcleod.net/tech/2016/04/29/benchmarking-io.html , I believe it's one of the most useful tools for I/O benchmarking. Just for a laugh I compared dd writes for 4k (small) writes between the client (gluster mounted on the cli) and a gluster host (to a directory on the same storage as the bricks). The client came out faster, likely the direct I/O flag was not working as perhaps intended. Client: # dd if=/dev/zero of=./some-file.bin bs=4K count=4096 oflag=direct 4096+0 records in 4096+0 records out 16777216 bytes (17 MB) copied, 2.27839 s, 7.4 MB/s Server: dd if=/dev/zero of=./some-file.bin bs=4K count=4096 oflag=direct 4096+0 records in 4096+0 records out 16777216 bytes (17 MB) copied, 3.94093 s, 4.3 MB/s Note: At the end of this reply is a listing of all our volume settings (gluster get volname all). Here is an output of all gluster volume settings as they currently stand: # gluster volume get uat_storage all Option Value ------ ----- cluster.lookup-unhashed on cluster.lookup-optimize true cluster.min-free-disk 10% cluster.min-free-inodes 5% cluster.rebalance-stats off cluster.subvols-per-directory (null) cluster.readdir-optimize true cluster.rsync-hash-regex (null) cluster.extra-hash-regex (null) cluster.dht-xattr-name trusted.glusterfs.dht cluster.randomize-hash-range-by-gfid off cluster.rebal-throttle normal cluster.lock-migration off cluster.local-volume-name (null) cluster.weighted-rebalance on cluster.switch-pattern (null) cluster.entry-change-log on cluster.read-subvolume (null) cluster.read-subvolume-index -1 cluster.read-hash-mode 1 cluster.background-self-heal-count 8 cluster.metadata-self-heal on cluster.data-self-heal on cluster.entry-self-heal on cluster.self-heal-daemon on cluster.heal-timeout 600 cluster.self-heal-window-size 1 cluster.data-change-log on cluster.metadata-change-log on cluster.data-self-heal-algorithm (null) cluster.eager-lock true disperse.eager-lock on disperse.other-eager-lock on cluster.quorum-type none cluster.quorum-count (null) cluster.choose-local true cluster.self-heal-readdir-size 1KB cluster.post-op-delay-secs 1 cluster.ensure-durability on cluster.consistent-metadata no cluster.heal-wait-queue-length 128 cluster.favorite-child-policy size cluster.full-lock yes cluster.stripe-block-size 128KB cluster.stripe-coalesce true diagnostics.latency-measurement off diagnostics.dump-fd-stats off diagnostics.count-fop-hits off diagnostics.brick-log-level ERROR diagnostics.client-log-level ERROR diagnostics.brick-sys-log-level CRITICAL diagnostics.client-sys-log-level CRITICAL diagnostics.brick-logger (null) diagnostics.client-logger (null) diagnostics.brick-log-format (null) diagnostics.client-log-format (null) diagnostics.brick-log-buf-size 5 diagnostics.client-log-buf-size 5 diagnostics.brick-log-flush-timeout 120 diagnostics.client-log-flush-timeout 120 diagnostics.stats-dump-interval 0 diagnostics.fop-sample-interval 0 diagnostics.stats-dump-format json diagnostics.fop-sample-buf-size 65535 diagnostics.stats-dnscache-ttl-sec 86400 performance.cache-max-file-size 6MB performance.cache-min-file-size 0 performance.cache-refresh-timeout 60 performance.cache-priority performance.cache-size 1024MB performance.io-thread-count 16 performance.high-prio-threads 16 performance.normal-prio-threads 16 performance.low-prio-threads 16 performance.least-prio-threads 1 performance.enable-least-priority on performance.cache-size 1024MB performance.flush-behind on performance.nfs.flush-behind on performance.write-behind-window-size 256MB performance.resync-failed-syncs-after-fsyncoff performance.nfs.write-behind-window-size1MB performance.strict-o-direct off performance.nfs.strict-o-direct off performance.strict-write-ordering off performance.nfs.strict-write-ordering off performance.write-behind-trickling-writeson performance.nfs.write-behind-trickling-writeson performance.lazy-open yes performance.read-after-open no performance.read-ahead-page-count 4 performance.md-cache-timeout 600 performance.cache-swift-metadata true performance.cache-samba-metadata false performance.cache-capability-xattrs true performance.cache-ima-xattrs true features.encryption off encryption.master-key (null) encryption.data-key-size 256 encryption.block-size 4096 network.frame-timeout 1800 network.ping-timeout 15 network.tcp-window-size (null) features.lock-heal off features.grace-timeout 10 network.remote-dio disable client.event-threads 8 client.tcp-user-timeout 0 client.keepalive-time 20 client.keepalive-interval 2 client.keepalive-count 9 network.tcp-window-size (null) network.inode-lru-limit 50000 auth.allow * auth.reject (null) transport.keepalive 1 server.allow-insecure (null) server.root-squash off server.anonuid 65534 server.anongid 65534 server.statedump-path /var/run/gluster server.outstanding-rpc-limit 256 features.lock-heal off features.grace-timeout 10 server.ssl (null) auth.ssl-allow * server.manage-gids off server.dynamic-auth on client.send-gids on server.gid-timeout 300 server.own-thread (null) server.event-threads 8 server.tcp-user-timeout 0 server.keepalive-time 20 server.keepalive-interval 2 server.keepalive-count 9 transport.listen-backlog 2048 ssl.own-cert (null) ssl.private-key (null) ssl.ca-list (null) ssl.crl-path (null) ssl.certificate-depth (null) ssl.cipher-list (null) ssl.dh-param (null) ssl.ec-curve (null) transport.address-family inet performance.write-behind on performance.read-ahead on performance.readdir-ahead on performance.io-cache on performance.quick-read on performance.open-behind on performance.nl-cache off performance.stat-prefetch true performance.client-io-threads true performance.nfs.write-behind on performance.nfs.read-ahead off performance.nfs.io-cache off performance.nfs.quick-read off performance.nfs.stat-prefetch off performance.nfs.io-threads off performance.force-readdirp true performance.cache-invalidation true features.uss off features.snapshot-directory .snaps features.show-snapshot-directory off network.compression off network.compression.window-size -15 network.compression.mem-level 8 network.compression.min-size 0 network.compression.compression-level -1 network.compression.debug false features.limit-usage (null) features.default-soft-limit 80% features.soft-timeout 60 features.hard-timeout 5 features.alert-time 86400 features.quota-deem-statfs off geo-replication.indexing off geo-replication.indexing off geo-replication.ignore-pid-check off geo-replication.ignore-pid-check off features.quota off features.inode-quota off features.bitrot disable debug.trace off debug.log-history no debug.log-file no debug.exclude-ops (null) debug.include-ops (null) debug.error-gen off debug.error-failure (null) debug.error-number (null) debug.random-failure off debug.error-fops (null) nfs.disable on features.read-only off features.worm off features.worm-file-level off features.worm-files-deletable on features.default-retention-period 120 features.retention-mode relax features.auto-commit-period 180 storage.linux-aio off storage.batch-fsync-mode reverse-fsync storage.batch-fsync-delay-usec 0 storage.owner-uid -1 storage.owner-gid -1 storage.node-uuid-pathinfo off storage.health-check-interval 30 storage.build-pgfid off storage.gfid2path on storage.gfid2path-separator : storage.reserve 1 storage.bd-aio off config.gfproxyd off cluster.server-quorum-type off cluster.server-quorum-ratio 0 changelog.changelog off changelog.changelog-dir (null) changelog.encoding ascii changelog.rollover-time 15 changelog.fsync-interval 5 changelog.changelog-barrier-timeout 120 changelog.capture-del-path off features.barrier disable features.barrier-timeout 120 features.trash off features.trash-dir .trashcan features.trash-eliminate-path (null) features.trash-max-filesize 5MB features.trash-internal-op off cluster.enable-shared-storage disable cluster.write-freq-threshold 0 cluster.read-freq-threshold 0 cluster.tier-pause off cluster.tier-promote-frequency 120 cluster.tier-demote-frequency 3600 cluster.watermark-hi 90 cluster.watermark-low 75 cluster.tier-mode cache cluster.tier-max-promote-file-size 0 cluster.tier-max-mb 4000 cluster.tier-max-files 10000 cluster.tier-query-limit 100 cluster.tier-compact on cluster.tier-hot-compact-frequency 604800 cluster.tier-cold-compact-frequency 604800 features.ctr-enabled off features.record-counters off features.ctr-record-metadata-heat off features.ctr_link_consistency off features.ctr_lookupheal_link_timeout 300 features.ctr_lookupheal_inode_timeout 300 features.ctr-sql-db-cachesize 12500 features.ctr-sql-db-wal-autocheckpoint 25000 features.selinux on locks.trace off locks.mandatory-locking off cluster.disperse-self-heal-daemon enable cluster.quorum-reads no client.bind-insecure (null) features.shard off features.shard-block-size 64MB features.scrub-throttle lazy features.scrub-freq biweekly features.scrub false features.expiry-time 120 features.cache-invalidation true features.cache-invalidation-timeout 600 features.leases off features.lease-lock-recall-timeout 60 disperse.background-heals 8 disperse.heal-wait-qlength 128 cluster.heal-timeout 600 dht.force-readdirp on disperse.read-policy round-robin cluster.shd-max-threads 1 cluster.shd-wait-qlength 1024 cluster.locking-scheme full cluster.granular-entry-heal no features.locks-revocation-secs 0 features.locks-revocation-clear-all false features.locks-revocation-max-blocked 0 features.locks-monkey-unlocking false disperse.shd-max-threads 1 disperse.shd-wait-qlength 1024 disperse.cpu-extensions auto disperse.self-heal-window-size 1 cluster.use-compound-fops true performance.parallel-readdir off performance.rda-request-size 131072 performance.rda-low-wmark 4096 performance.rda-high-wmark 128KB performance.rda-cache-limit 256MB performance.nl-cache-positive-entry false performance.nl-cache-limit 10MB performance.nl-cache-timeout 60 cluster.brick-multiplex off cluster.max-bricks-per-process 0 disperse.optimistic-change-log on cluster.halo-enabled False cluster.halo-shd-max-latency 99999 cluster.halo-nfsd-max-latency 5 cluster.halo-max-latency 5 cluster.halo-max-replicas 99999 cluster.halo-min-replicas 2 debug.delay-gen off delay-gen.delay-percentage 10% delay-gen.delay-duration 100000 delay-gen.enable (null) disperse.parallel-writes on -- Sam McLeod (protoporpoise on IRC) https://smcleod.net https://twitter.com/s_mcleod Words are my own opinions and do not necessarily represent those of my employer or partners. |
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users