Re: Performance is falling rapidly when updating from v5.5 to v7.0

RAFI KC <rkavunga@xxxxxxxxxx> · Wed, 6 Nov 2019 08:40:11 +0530

    I will take a look at the profile info shared. Since there is a
      huge difference in the performance numbers between fuse and samba,
      it would be great if we can get the profile info of fuse (on v7).
      This will help to compare the number of calls for each fops. There
      should be some fops that samba repeat, and we can find out it by
      comparing with fuse.
    Also if possible, can you please get client profile info from
      fuse mount using the command `setxattr -n trusted.io-stats-dump -v
      <logfile /tmp/iostat.log> </mnt/fuse(mount point)>`.

    Regards
    Rafi KC

    On 11/5/19 11:05 PM, David Spisla
      wrote:

        I did the test with Gluster 7.0 ctime disabled. But it had
          no effect:

          (All values in MiB/s)

            64KiB    1MiB     10MiB
          0,16       2,60       54,74

          Attached there is now the complete profile file also with
            the results from the last test. I will not repeat it with an
            higher inode size because I don't think this will have an
            effect.
          There must be another cause for the low performance

    Yes. No need to try with higher inode size

          Regards
          David Spisla

        Am Di., 5. Nov. 2019 um
          16:25 Uhr schrieb David Spisla <spisla80@xxxxxxxxx>:

              Am Di., 5. Nov. 2019 um
                12:06 Uhr schrieb RAFI KC <rkavunga@xxxxxxxxxx>:

                  On 11/4/19 8:46 PM, David Spisla wrote:

                          Dear Gluster Community,

                          I also have a issue concerning
                            performance. The last days I updated our
                            test cluster from GlusterFS v5.5 to v7.0 .
                            The setup in general:

                          2 HP DL380 Servers with 10Gbit NICs, 1
                            Distribute-Replica 2 Volume with 2 Replica
                            Pairs. Client is SMB Samba (access via
                            vfs_glusterfs) . I did several tests to
                            ensure that Samba don't causes the fall.
                          The setup ist completely the same except
                            the Gluster Version

                          Here are my results:
                          64KiB           1MiB             10MiB  
                                     (Filesize)

                              3,49             47,41           
                                300,50          (Values in MiB/s with
                                GlusterFS v5.5) 

                              0,16              2,61            
                                76,63            (Values in MiB/s with
                                GlusterFS v7.0) 

                  Can you please share the profile information [1]
                    for both versions?  Also it would be really helpful
                    if you can mention the io patterns that used for
                    this tests.

                  [1] :
                    https://docs.gluster.org/en/latest/Administrator%20Guide/Monitoring%20Workload/

              Hello Rafi,

              thank you for your help.

              * First more information about the io patterns: As a
                client we use a DL360 Windws Server 2017 machine with
                10Gbit NIC connected to the storage machines. The share
                will be mounted via SMB and the tests writes with fio.
                We use this job files (see attachment). Each job file
                will be executed separetely and there is a sleep about
                60s between each test run to calm down the system before
                starting a new test.

              * Attached below you find the profile output from the
                tests with v5.5 (ctime enabled), v7.0 (ctime enabled).

                * Beside of the tests with Samba I did also some
                  fio tests directly on the FUSE Mounts (locally on one
                  of the storage nodes). The results show that there is
                  only a small decrease of performance between v5.5 and
                  v7.0

                (All values in MiB/s)

                  64KiB    1MiB     10MiB

                  50,09     679,96   1023,02 (v5.5)

                  47,00     656,46    977,60 (v7.0)

                It seems to be that the combination of samba +
                  gluster7.0 has a lot of problems, or not?

                              We use this volume options (GlusterFS
                                7.0):

                              Volume Name: archive1

                                Type: Distributed-Replicate

                                Volume ID:
                                44c17844-0bd4-4ca2-98d8-a1474add790c

                                Status: Started

                                Snapshot Count: 0

                                Number of Bricks: 2 x 2 = 4

                                Transport-type: tcp

                                Bricks:

                                Brick1:
                                fs-dl380-c1-n1:/gluster/brick1/glusterbrick

                                Brick2:
                                fs-dl380-c1-n2:/gluster/brick1/glusterbrick

                                Brick3:
                                fs-dl380-c1-n1:/gluster/brick2/glusterbrick

                                Brick4:
                                fs-dl380-c1-n2:/gluster/brick2/glusterbrick

                                Options Reconfigured:

                                performance.client-io-threads: off

                                nfs.disable: on

                                storage.fips-mode-rchecksum: on

                                transport.address-family: inet

                                user.smb: disable

                                features.read-only: off

                                features.worm: off

                                features.worm-file-level: on

                                features.retention-mode: enterprise

                                features.default-retention-period: 120

                                network.ping-timeout: 10

                                features.cache-invalidation: on

                                features.cache-invalidation-timeout: 600

                                performance.nl-cache: on

                                performance.nl-cache-timeout: 600

                                client.event-threads: 32

                                server.event-threads: 32

                                cluster.lookup-optimize: on

                                performance.stat-prefetch: on

                                performance.cache-invalidation: on

                                performance.md-cache-timeout: 600

                                performance.cache-samba-metadata: on

                                performance.cache-ima-xattrs: on

                                performance.io-thread-count: 64

                                cluster.use-compound-fops: on

                                performance.cache-size: 512MB

                                performance.cache-refresh-timeout: 10

                                performance.read-ahead: off

                                performance.write-behind-window-size:
                                4MB

                                performance.write-behind: on

                                storage.build-pgfid: on

                                features.ctime: on

                                cluster.quorum-type: fixed

                                cluster.quorum-count: 1

                                features.bitrot: on

                                features.scrub: Active

                                features.scrub-freq: daily

                          For GlusterFS 5.5 its nearly the same
                            except the fact that there were 2 options to
                            enable ctime feature. 

                    Ctime stores additional metadata information as an
                    extended attributes which sometimes exceeds the
                    default inode size. In such scenarios the additional
                    xattrs won't fit into the default size. This will
                    result in additional blocks to be used to store
                    xattrs in the inide, which will effect the latency.
                    This is purely based on the i/o operations and the
                    total xattrs size stored in the inode.

                    Is it possible for you to repeat the test by
                    disabling ctime or increasing the inode size to a
                    higher value say 1024KB?

              I will do so but for today I could not finish tests
                with ctime disabled (or higher inode value) because it
                takes a lot of time with v7.0 due to the low performance
                and I will perform it tomorrow. As soon as possible I
                give you the results.
              By the way: You really mean inode size on xfs layer
                1024KB? Or do you mean 1024Bytes? We use per default
                512Bytes, because this is the recommended size until now
                . But it seems to be that there is a need for a new
                recommendation when using ctime feature as a default. I
                can not image that this is the real cause for the low
                performance because in v5.5 we also use ctime feature
                with inode size 512Bytes.

              Regards
              David

                          Our optimization for Samba looks like
                            this (for every version):

                          [global]

                            workgroup = SAMBA

                            netbios name = CLUSTER

                            kernel share modes = no

                            aio read size = 1

                            aio write size = 1

                            kernel oplocks = no

                            max open files = 100000

                            nt acl support = no

                            security = user

                            server min protocol = SMB2

                            store dos attributes = no

                            strict locking = no

                            full_audit:failure = pwrite_send pwrite_recv
                            pwrite offload_write_send offload_write_recv
                            create_file open unlink connect disconnect
                            rename chown fchown lchown chmod fchmod
                            mkdir rmdir ntimes ftruncate fallocate 

                            full_audit:success = pwrite_send pwrite_recv
                            pwrite offload_write_send offload_write_recv
                            create_file open unlink connect disconnect
                            rename chown fchown lchown chmod fchmod
                            mkdir rmdir ntimes ftruncate fallocate 

                            full_audit:facility = local5

                            durable handles = yes

                            posix locking = no

                            log level = 2

                            max log size = 100000

                            debug pid = yes

                          What can be the cause for this rapid
                            falling of the performance for small files?
                            Are some of our vol options not recommended
                            anymore? 

                          There were some patches concerning
                            performance for small files in v6.0 und v7.0
                            :

                            #1670031:
                                  performance regression seen with
                                  smallfile workload tests

                            #1659327: 43%
                                  regression in small-file sequential
                                  read performance
                            And
                              one patch for the io-cache:

                            #1659869:
                                  improvements to io-cache
                            Regards
                            David
                              Spisla

                    ________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/118564314

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/118564314

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/118564314

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/118564314

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users