Re: Gluster distributed replicated setup does not serve read from all bricks belonging to the same replica

Anh Vo <vtqanh@xxxxxxxxx> · Fri, 23 Nov 2018 23:33:47 -0800

Looking at the source (afr-common.c) even in the case of using hashed mode and the hashed brick doesn't have a good copy it will try the next brick am I correct? I'm curious because your first reply seemed to place some significance on the part about pending self-heal. Is there anything about pending self-heal that would have made hashed mode worse, or is it about as bad as any brick selection policy?
Thanks

On Thu, Nov 22, 2018 at 7:59 PM Ravishankar N <ravishankar@xxxxxxxxxx> wrote:

    On 11/22/2018 07:07 PM, Anh Vo wrote:

      Thanks Ravi, I will try that option.
        One question:
        Let's say there are self heal pending, how would the
          default of "0" have worked? I understand 0 means "first
          responder" What if first responder doesn't have good copy?
          (and it failed in such a way that the dirty attribute wasn't
          set on its copy - but there are index heal pending from the
          other two sources)

    0 = first readable child of AFR, starting from 1st child. So if 1st
    brick doesn't have the good copy, it will try the 2nd brick and so
    on.  

    The default value seems to be '1' not '0'. You can look at
    afr_read_subvol_select_by_policy() in the source code to understand
    the preference of selection.

    Regards,

    Ravi

        On Wed, Nov 21, 2018 at 9:57 PM Ravishankar N
          <ravishankar@xxxxxxxxxx> wrote:

           Hi,

            If there are multiple clients , you can change the
            'cluster.read-hash-mode' volume option's value to 2. Then
            different reads should be served from different bricks for
            different clients. The meaning of various values for
            'cluster.read-hash-mode' can be got from `gluster volume set
            help`. gluster-4.1 also has added a new value[1] to this
            option. Of course, the assumption is that all bricks host
            good copies (i.e. there are no self-heals pending).

            Hope this helps,

            Ravi

            [1]  https://review.gluster.org/#/c/glusterfs/+/19698/

            On
              11/22/2018 10:20 AM, Anh Vo wrote:

                Hi,
                  Our setup: We have a distributed replicated setup
                    of 3 replica. The total number of servers varies
                    between clusters, in some cases we have a total of
                    36 (12 x 3) servers, in some of them we have 12
                    servers (4 x 3). We're using gluster 3.12.15

                  In all instances what I am noticing is that only
                    one member of the replica is serving read for a
                    particular file, even when all the members of the
                    replica set is online. We have many large input
                    files (for example: 150GB zip file) and when there
                    are 50 clients reading from one single server the
                    performance degrades by several magnitude for
                    reading that file only. Shouldn't all members of the
                    replica participate in serving the read requests?

                  Our options

                  cluster.shd-max-threads: 1
                  cluster.heal-timeout: 900
                  network.inode-lru-limit: 50000
                  performance.md-cache-timeout: 600
                  performance.cache-invalidation: on
                  performance.stat-prefetch: on
                  features.cache-invalidation-timeout: 600
                  features.cache-invalidation: on
                  cluster.metadata-self-heal: off
                  cluster.entry-self-heal: off
                  cluster.data-self-heal: off
                  features.inode-quota: off
                  features.quota: off
                  transport.listen-backlog: 100
                  transport.address-family: inet
                  performance.readdir-ahead: on
                  nfs.disable: on
                  performance.strict-o-direct: on
                  network.remote-dio: off
                  server.allow-insecure: on
                  performance.write-behind: off
                  cluster.nufa: disable
                  diagnostics.latency-measurement: on
                  diagnostics.count-fop-hits: on
                  cluster.ensure-durability: off
                  cluster.self-heal-window-size: 32
                  cluster.favorite-child-policy: mtime
                  performance.io-thread-count: 32
                  cluster.eager-lock: off
                  server.outstanding-rpc-limit: 128
                  cluster.rebal-throttle: aggressive
                  server.event-threads: 3
                  client.event-threads: 3
                  performance.cache-size: 6GB
                  cluster.readdir-optimize: on
                  storage.build-pgfid: on

              _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users