Re: Extremely low performance - am I doing somethingwrong?

Vladimir Melnik <v.melnik@xxxxxxxx> · Wed, 3 Jul 2019 21:44:57 +0300

Thank you, I tried to do that.

Created a new volume:
$ gluster volume create storage2 \
	replica 3 \
	arbiter 1 \
	transport tcp \
	gluster1.k8s.maitre-d.tucha.ua:/mnt/storage2/brick1 \
	gluster2.k8s.maitre-d.tucha.ua:/mnt/storage2/brick2 \
	gluster3.k8s.maitre-d.tucha.ua:/mnt/storage2/brick_arbiter \
	gluster3.k8s.maitre-d.tucha.ua:/mnt/storage2/brick3 \
	gluster4.k8s.maitre-d.tucha.ua:/mnt/storage2/brick4 \
	gluster4.k8s.maitre-d.tucha.ua:/mnt/storage2/brick_arbiter

Changed the volume's settings:
$ gluster volume set storage2 group virt

Started the volume:
$ gluster volume start storage2

And the same thing:
$ for i in {1..5}; do { dd if=/dev/zero of=/mnt/glusterfs2/test.tmp bs=1M count=10 oflag=sync; rm -f /mnt/glusterfs2/test.tmp; } done 2>&1 | grep copied
10485760 bytes (10 MB) copied, 0.988662 s, 10.6 MB/s
10485760 bytes (10 MB) copied, 0.768863 s, 13.6 MB/s
10485760 bytes (10 MB) copied, 0.828568 s, 12.7 MB/s
10485760 bytes (10 MB) copied, 0.84322 s, 12.4 MB/s
10485760 bytes (10 MB) copied, 0.812504 s, 12.9 MB/s

On Wed, Jul 03, 2019 at 05:59:24PM +0000, Strahil Nikolov wrote:
>  Can you try with a fresh replica volume with 'virt' group applied ?
> Best Regards,Strahil Nikolov
>     В сряда, 3 юли 2019 г., 19:18:18 ч. Гринуич+3, Vladimir Melnik <v.melnik@xxxxxxxx> написа:  
>  
>  Thank you, it helped a little:
> 
> $ for i in {1..5}; do { dd if=/dev/zero of=/mnt/glusterfs1/test.tmp bs=1M count=10 oflag=sync; rm -f /mnt/glusterfs1/test.tmp; } done 2>&1 | grep copied
> 10485760 bytes (10 MB) copied, 0.738968 s, 14.2 MB/s
> 10485760 bytes (10 MB) copied, 0.725296 s, 14.5 MB/s
> 10485760 bytes (10 MB) copied, 0.681508 s, 15.4 MB/s
> 10485760 bytes (10 MB) copied, 0.85566 s, 12.3 MB/s
> 10485760 bytes (10 MB) copied, 0.661457 s, 15.9 MB/s
> 
> But 14-15 MB/s is still quite far from the actual storage's performance (200-3000 MB/s). :-(
> 
> Here's full configuration dump (just in case):
> 
> Option                                  Value
> ------                                  -----
> cluster.lookup-unhashed                on
> cluster.lookup-optimize                on
> cluster.min-free-disk                  10%
> cluster.min-free-inodes                5%
> cluster.rebalance-stats                off
> cluster.subvols-per-directory          (null)
> cluster.readdir-optimize                off
> cluster.rsync-hash-regex                (null)
> cluster.extra-hash-regex                (null)
> cluster.dht-xattr-name                  trusted.glusterfs.dht
> cluster.randomize-hash-range-by-gfid    off
> cluster.rebal-throttle                  normal
> cluster.lock-migration                  off
> cluster.force-migration                off
> cluster.local-volume-name              (null)
> cluster.weighted-rebalance              on
> cluster.switch-pattern                  (null)
> cluster.entry-change-log                on
> cluster.read-subvolume                  (null)
> cluster.read-subvolume-index            -1
> cluster.read-hash-mode                  1
> cluster.background-self-heal-count      8
> cluster.metadata-self-heal              off
> cluster.data-self-heal                  off
> cluster.entry-self-heal                off
> cluster.self-heal-daemon                on
> cluster.heal-timeout                    600
> cluster.self-heal-window-size          1
> cluster.data-change-log                on
> cluster.metadata-change-log            on
> cluster.data-self-heal-algorithm        full
> cluster.eager-lock                      enable
> disperse.eager-lock                    on
> disperse.other-eager-lock              on
> disperse.eager-lock-timeout            1
> disperse.other-eager-lock-timeout      1
> cluster.quorum-type                    auto
> cluster.quorum-count                    (null)
> cluster.choose-local                    off
> cluster.self-heal-readdir-size          1KB
> cluster.post-op-delay-secs              1
> cluster.ensure-durability              on
> cluster.consistent-metadata            no
> cluster.heal-wait-queue-length          128
> cluster.favorite-child-policy          none
> cluster.full-lock                      yes
> diagnostics.latency-measurement        off
> diagnostics.dump-fd-stats              off
> diagnostics.count-fop-hits              off
> diagnostics.brick-log-level            INFO
> diagnostics.client-log-level            INFO
> diagnostics.brick-sys-log-level        CRITICAL
> diagnostics.client-sys-log-level        CRITICAL
> diagnostics.brick-logger                (null)
> diagnostics.client-logger              (null)
> diagnostics.brick-log-format            (null)
> diagnostics.client-log-format          (null)
> diagnostics.brick-log-buf-size          5
> diagnostics.client-log-buf-size        5
> diagnostics.brick-log-flush-timeout    120
> diagnostics.client-log-flush-timeout    120
> diagnostics.stats-dump-interval        0
> diagnostics.fop-sample-interval        0
> diagnostics.stats-dump-format          json
> diagnostics.fop-sample-buf-size        65535
> diagnostics.stats-dnscache-ttl-sec      86400
> performance.cache-max-file-size        0
> performance.cache-min-file-size        0
> performance.cache-refresh-timeout      1
> performance.cache-priority
> performance.cache-size                  32MB
> performance.io-thread-count            16
> performance.high-prio-threads          16
> performance.normal-prio-threads        16
> performance.low-prio-threads            32
> performance.least-prio-threads          1
> performance.enable-least-priority      on
> performance.iot-watchdog-secs          (null)
> performance.iot-cleanup-disconnected-reqsoff
> performance.iot-pass-through            false
> performance.io-cache-pass-through      false
> performance.cache-size                  128MB
> performance.qr-cache-timeout            1
> performance.cache-invalidation          false
> performance.ctime-invalidation          false
> performance.flush-behind                on
> performance.nfs.flush-behind            on
> performance.write-behind-window-size    1MB
> performance.resync-failed-syncs-after-fsyncoff
> performance.nfs.write-behind-window-size1MB
> performance.strict-o-direct            off
> performance.nfs.strict-o-direct        off
> performance.strict-write-ordering      off
> performance.nfs.strict-write-ordering  off
> performance.write-behind-trickling-writeson
> performance.aggregate-size              128KB
> performance.nfs.write-behind-trickling-writeson
> performance.lazy-open                  yes
> performance.read-after-open            yes
> performance.open-behind-pass-through    false
> performance.read-ahead-page-count      4
> performance.read-ahead-pass-through    false
> performance.readdir-ahead-pass-through  false
> performance.md-cache-pass-through      false
> performance.md-cache-timeout            1
> performance.cache-swift-metadata        true
> performance.cache-samba-metadata        false
> performance.cache-capability-xattrs    true
> performance.cache-ima-xattrs            true
> performance.md-cache-statfs            off
> performance.xattr-cache-list
> performance.nl-cache-pass-through      false
> features.encryption                    off
> network.frame-timeout                  1800
> network.ping-timeout                    42
> network.tcp-window-size                (null)
> client.ssl                              off
> network.remote-dio                      enable
> client.event-threads                    4
> client.tcp-user-timeout                0
> client.keepalive-time                  20
> client.keepalive-interval              2
> client.keepalive-count                  9
> network.tcp-window-size                (null)
> network.inode-lru-limit                16384
> auth.allow                              *
> auth.reject                            (null)
> transport.keepalive                    1
> server.allow-insecure                  on
> server.root-squash                      off
> server.all-squash                      off
> server.anonuid                          65534
> server.anongid                          65534
> server.statedump-path                  /var/run/gluster
> server.outstanding-rpc-limit            64
> server.ssl                              off
> auth.ssl-allow                          *
> server.manage-gids                      off
> server.dynamic-auth                    on
> client.send-gids                        on
> server.gid-timeout                      300
> server.own-thread                      (null)
> server.event-threads                    4
> server.tcp-user-timeout                42
> server.keepalive-time                  20
> server.keepalive-interval              2
> server.keepalive-count                  9
> transport.listen-backlog                1024
> transport.address-family                inet
> performance.write-behind                on
> performance.read-ahead                  off
> performance.readdir-ahead              on
> performance.io-cache                    off
> performance.open-behind                on
> performance.quick-read                  off
> performance.nl-cache                    off
> performance.stat-prefetch              on
> performance.client-io-threads          on
> performance.nfs.write-behind            on
> performance.nfs.read-ahead              off
> performance.nfs.io-cache                off
> performance.nfs.quick-read              off
> performance.nfs.stat-prefetch          off
> performance.nfs.io-threads              off
> performance.force-readdirp              true
> performance.cache-invalidation          false
> performance.global-cache-invalidation  true
> features.uss                            off
> features.snapshot-directory            .snaps
> features.show-snapshot-directory        off
> features.tag-namespaces                off
> network.compression                    off
> network.compression.window-size        -15
> network.compression.mem-level          8
> network.compression.min-size            0
> network.compression.compression-level  -1
> network.compression.debug              false
> features.default-soft-limit            80%
> features.soft-timeout                  60
> features.hard-timeout                  5
> features.alert-time                    86400
> features.quota-deem-statfs              off
> geo-replication.indexing                off
> geo-replication.indexing                off
> geo-replication.ignore-pid-check        off
> geo-replication.ignore-pid-check        off
> features.quota                          off
> features.inode-quota                    off
> features.bitrot                        disable
> debug.trace                            off
> debug.log-history                      no
> debug.log-file                          no
> debug.exclude-ops                      (null)
> debug.include-ops                      (null)
> debug.error-gen                        off
> debug.error-failure                    (null)
> debug.error-number                      (null)
> debug.random-failure                    off
> debug.error-fops                        (null)
> nfs.disable                            on
> features.read-only                      off
> features.worm                          off
> features.worm-file-level                off
> features.worm-files-deletable          on
> features.default-retention-period      120
> features.retention-mode                relax
> features.auto-commit-period            180
> storage.linux-aio                      off
> storage.batch-fsync-mode                reverse-fsync
> storage.batch-fsync-delay-usec          0
> storage.owner-uid                      -1
> storage.owner-gid                      -1
> storage.node-uuid-pathinfo              off
> storage.health-check-interval          30
> storage.build-pgfid                    off
> storage.gfid2path                      on
> storage.gfid2path-separator            :
> storage.reserve                        1
> storage.health-check-timeout            10
> storage.fips-mode-rchecksum            off
> storage.force-create-mode              0000
> storage.force-directory-mode            0000
> storage.create-mask                    0777
> storage.create-directory-mask          0777
> storage.max-hardlinks                  100
> features.ctime                          on
> config.gfproxyd                        off
> cluster.server-quorum-type              server
> cluster.server-quorum-ratio            0
> changelog.changelog                    off
> changelog.changelog-dir                {{ brick.path }}/.glusterfs/changelogs
> changelog.encoding                      ascii
> changelog.rollover-time                15
> changelog.fsync-interval                5
> changelog.changelog-barrier-timeout    120
> changelog.capture-del-path              off
> features.barrier                        disable
> features.barrier-timeout                120
> features.trash                          off
> features.trash-dir                      .trashcan
> features.trash-eliminate-path          (null)
> features.trash-max-filesize            5MB
> features.trash-internal-op              off
> cluster.enable-shared-storage          disable
> locks.trace                            off
> locks.mandatory-locking                off
> cluster.disperse-self-heal-daemon      enable
> cluster.quorum-reads                    no
> client.bind-insecure                    (null)
> features.shard                          on
> features.shard-block-size              64MB
> features.shard-lru-limit                16384
> features.shard-deletion-rate            100
> features.scrub-throttle                lazy
> features.scrub-freq                    biweekly
> features.scrub                          false
> features.expiry-time                    120
> features.cache-invalidation            off
> features.cache-invalidation-timeout    60
> features.leases                        off
> features.lease-lock-recall-timeout      60
> disperse.background-heals              8
> disperse.heal-wait-qlength              128
> cluster.heal-timeout                    600
> dht.force-readdirp                      on
> disperse.read-policy                    gfid-hash
> cluster.shd-max-threads                8
> cluster.shd-wait-qlength                10000
> cluster.shd-wait-qlength                10000
> cluster.locking-scheme                  granular
> cluster.granular-entry-heal            no
> features.locks-revocation-secs          0
> features.locks-revocation-clear-all    false
> features.locks-revocation-max-blocked  0
> features.locks-monkey-unlocking        false
> features.locks-notify-contention        no
> features.locks-notify-contention-delay  5
> disperse.shd-max-threads                1
> disperse.shd-wait-qlength              1024
> disperse.cpu-extensions                auto
> disperse.self-heal-window-size          1
> cluster.use-compound-fops              off
> performance.parallel-readdir            off
> performance.rda-request-size            131072
> performance.rda-low-wmark              4096
> performance.rda-high-wmark              128KB
> performance.rda-cache-limit            10MB
> performance.nl-cache-positive-entry    false
> performance.nl-cache-limit              10MB
> performance.nl-cache-timeout            60
> cluster.brick-multiplex                off
> cluster.max-bricks-per-process          250
> disperse.optimistic-change-log          on
> disperse.stripe-cache                  4
> cluster.halo-enabled                    False
> cluster.halo-shd-max-latency            99999
> cluster.halo-nfsd-max-latency          5
> cluster.halo-max-latency                5
> cluster.halo-max-replicas              99999
> cluster.halo-min-replicas              2
> features.selinux                        on
> cluster.daemon-log-level                INFO
> debug.delay-gen                        off
> delay-gen.delay-percentage              10%
> delay-gen.delay-duration                100000
> delay-gen.enable
> disperse.parallel-writes                on
> features.sdfs                          off
> features.cloudsync                      off
> features.ctime                          on
> ctime.noatime                          on
> feature.cloudsync-storetype            (null)
> features.enforce-mandatory-lock        off
> 
> What do you think, are there any other knobs worth to be turned?
> 
> Thanks!
> 
> On Wed, Jul 03, 2019 at 06:55:09PM +0300, Strahil wrote:
> > Check the following link (4.1)  for the optimal gluster volume settings.
> > They are quite safe.
> > 
> > Gluster  provides a group called  virt (/var/lib/glusterd/groups/virt)  and can be applied via  'gluster volume set VOLNAME group virt'
> > 
> > Then try again.
> > 
> > Best Regards,
> > Strahil NikolovOn Jul 3, 2019 11:39, Vladimir Melnik <v.melnik@xxxxxxxx> wrote:
> > >
> > > Dear colleagues, 
> > >
> > > I have a lab with a bunch of virtual machines (the virtualization is 
> > > provided by KVM) running on the same physical host. 4 of these VMs are 
> > > working as a GlusterFS cluster and there's one more VM that works as a 
> > > client. I'll specify all the packages' versions in the ending of this 
> > > message. 
> > >
> > > I created 2 volumes - one is having type "Distributed-Replicate" and 
> > > another one is "Distribute". The problem is that both of volumes are 
> > > showing really poor performance. 
> > >
> > > Here's what I see on the client: 
> > > $ mount | grep gluster 
> > > 10.13.1.16:storage1 on /mnt/glusterfs1 type fuse.glusterfs(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) 
> > > 10.13.1.16:storage2 on /mnt/glusterfs2 type fuse.glusterfs(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) 
> > >
> > > $ for i in {1..5}; do { dd if=/dev/zero of=/mnt/glusterfs1/test.tmp bs=1M count=10 oflag=sync; rm -f /mnt/glusterfs1/test.tmp; } done 
> > > 10+0 records in 
> > > 10+0 records out 
> > > 10485760 bytes (10 MB) copied, 1.47936 s, 7.1 MB/s 
> > > 10+0 records in 
> > > 10+0 records out 
> > > 10485760 bytes (10 MB) copied, 1.62546 s, 6.5 MB/s 
> > > 10+0 records in 
> > > 10+0 records out 
> > > 10485760 bytes (10 MB) copied, 1.71229 s, 6.1 MB/s 
> > > 10+0 records in 
> > > 10+0 records out 
> > > 10485760 bytes (10 MB) copied, 1.68607 s, 6.2 MB/s 
> > > 10+0 records in 
> > > 10+0 records out 
> > > 10485760 bytes (10 MB) copied, 1.82204 s, 5.8 MB/s 
> > >
> > > $ for i in {1..5}; do { dd if=/dev/zero of=/mnt/glusterfs2/test.tmp bs=1M count=10 oflag=sync; rm -f /mnt/glusterfs2/test.tmp; } done 
> > > 10+0 records in 
> > > 10+0 records out 
> > > 10485760 bytes (10 MB) copied, 1.15739 s, 9.1 MB/s 
> > > 10+0 records in 
> > > 10+0 records out 
> > > 10485760 bytes (10 MB) copied, 0.978528 s, 10.7 MB/s 
> > > 10+0 records in 
> > > 10+0 records out 
> > > 10485760 bytes (10 MB) copied, 0.910642 s, 11.5 MB/s 
> > > 10+0 records in 
> > > 10+0 records out 
> > > 10485760 bytes (10 MB) copied, 0.998249 s, 10.5 MB/s 
> > > 10+0 records in 
> > > 10+0 records out 
> > > 10485760 bytes (10 MB) copied, 1.03377 s, 10.1 MB/s 
> > >
> > > The distributed one shows a bit better performance than the 
> > > distributed-replicated one, but it's still poor. :-( 
> > >
> > > The disk storage itself is OK, here's what I see on each of 4 GlusterFS 
> > > servers: 
> > > for i in {1..5}; do { dd if=/dev/zero of=/mnt/storage1/test.tmp bs=1M count=10 oflag=sync; rm -f /mnt/storage1/test.tmp; } done 
> > > 10+0 records in 
> > > 10+0 records out 
> > > 10485760 bytes (10 MB) copied, 0.0656698 s, 160 MB/s 
> > > 10+0 records in 
> > > 10+0 records out 
> > > 10485760 bytes (10 MB) copied, 0.0476927 s, 220 MB/s 
> > > 10+0 records in 
> > > 10+0 records out 
> > > 10485760 bytes (10 MB) copied, 0.036526 s, 287 MB/s 
> > > 10+0 records in 
> > > 10+0 records out 
> > > 10485760 bytes (10 MB) copied, 0.0329145 s, 319 MB/s 
> > > 10+0 records in 
> > > 10+0 records out 
> > > 10485760 bytes (10 MB) copied, 0.0403988 s, 260 MB/s 
> > >
> > > The network between all 5 VMs is OK, they all are working on the same 
> > > physical host. 
> > >
> > > Can't understand, what am I doing wrong. :-( 
> > >
> > > Here's the detailed info about the volumes: 
> > > Volume Name: storage1 
> > > Type: Distributed-Replicate 
> > > Volume ID: a42e2554-99e5-4331-bcc4-0900d002ae32 
> > > Status: Started 
> > > Snapshot Count: 0 
> > > Number of Bricks: 2 x (2 + 1) = 6 
> > > Transport-type: tcp 
> > > Bricks: 
> > > Brick1: gluster1.k8s.maitre-d.tucha.ua:/mnt/storage1/brick1 
> > > Brick2: gluster2.k8s.maitre-d.tucha.ua:/mnt/storage1/brick2 
> > > Brick3: gluster3.k8s.maitre-d.tucha.ua:/mnt/storage1/brick_arbiter (arbiter) 
> > > Brick4: gluster3.k8s.maitre-d.tucha.ua:/mnt/storage1/brick3 
> > > Brick5: gluster4.k8s.maitre-d.tucha.ua:/mnt/storage1/brick4 
> > > Brick6: gluster1.k8s.maitre-d.tucha.ua:/mnt/storage1/brick_arbiter (arbiter) 
> > > Options Reconfigured: 
> > > transport.address-family: inet 
> > > nfs.disable: on 
> > > performance.client-io-threads: off 
> > >
> > > Volume Name: storage2 
> > > Type: Distribute 
> > > Volume ID: df4d8096-ad03-493e-9e0e-586ce21fb067 
> > > Status: Started 
> > > Snapshot Count: 0 
> > > Number of Bricks: 4 
> > > Transport-type: tcp 
> > > Bricks: 
> > > Brick1: gluster1.k8s.maitre-d.tucha.ua:/mnt/storage2 
> > > Brick2: gluster2.k8s.maitre-d.tucha.ua:/mnt/storage2 
> > > Brick3: gluster3.k8s.maitre-d.tucha.ua:/mnt/storage2 
> > > Brick4: gluster4.k8s.maitre-d.tucha.ua:/mnt/storage2 
> > > Options Reconfigured: 
> > > transport.address-family: inet 
> > > nfs.disable: on 
> > >
> > > The OS is CentOS Linux release 7.6.1810. The packages I'm using are: 
> > > glusterfs-6.3-1.el7.x86_64 
> > > glusterfs-api-6.3-1.el7.x86_64 
> > > glusterfs-cli-6.3-1.el7.x86_64 
> > > glusterfs-client-xlators-6.3-1.el7.x86_64 
> > > glusterfs-fuse-6.3-1.el7.x86_64 
> > > glusterfs-libs-6.3-1.el7.x86_64 
> > > glusterfs-server-6.3-1.el7.x86_64 
> > > kernel-3.10.0-327.el7.x86_64 
> > > kernel-3.10.0-514.2.2.el7.x86_64 
> > > kernel-3.10.0-957.12.1.el7.x86_64 
> > > kernel-3.10.0-957.12.2.el7.x86_64 
> > > kernel-3.10.0-957.21.3.el7.x86_64 
> > > kernel-tools-3.10.0-957.21.3.el7.x86_64 
> > > kernel-tools-libs-3.10.0-957.21.3.el7.x86_6 
> > >
> > > Please, be so kind as to help me to understand, did I do it wrong or 
> > > that's quite normal performance of GlusterFS? 
> > >
> > > Thanks in advance! 
> > > _______________________________________________ 
> > > Gluster-users mailing list 
> > > Gluster-users@xxxxxxxxxxx 
> > > https://lists.gluster.org/mailman/listinfo/gluster-users 
> 
> -- 
> V.Melnik
>   

-- 
V.Melnik
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users