Re: Gluster distributed replicated setup does not serve read from all bricks belonging to the same replica

Ravishankar N <ravishankar@xxxxxxxxxx> · Sat, 24 Nov 2018 14:27:20 +0530



    On 11/24/2018 01:03 PM, Anh Vo wrote:

    
      Looking at the source (afr-common.c) even in the
        case of using hashed mode and the hashed brick doesn't have a
        good copy it will try the next brick am I correct?
    
    That is correct, no matter which brick the policy chooses,  if that
    brick is not readable for a given file (i.e. a heal is pending on it
    from the other good bricks), we just iterate from brick-0, and pick
    the first one that is good (i.e. readable).

    -Ravi

    
       I'm curious because your first reply seemed to
        place some significance on the part about pending self-heal. Is
        there anything about pending self-heal that would have made
        hashed mode worse, or is it about as bad as any brick selection
        policy?
        

        Thanks
      
      
        On Thu, Nov 22, 2018 at 7:59 PM Ravishankar N
          <ravishankar@xxxxxxxxxx> wrote:

        
            On
              11/22/2018 07:07 PM, Anh Vo wrote:

            
              Thanks Ravi, I will try that option.
                One question:
                Let's say there are self heal pending, how would
                  the default of "0" have worked? I understand 0 means
                  "first responder" What if first responder doesn't have
                  good copy? (and it failed in such a way that the dirty
                  attribute wasn't set on its copy - but there are index
                  heal pending from the other two sources)
              
            
            0 = first readable child of AFR, starting from 1st child. So
            if 1st brick doesn't have the good copy, it will try the 2nd
            brick and so on.  

            The default value seems to be '1' not '0'. You can look at
            afr_read_subvol_select_by_policy() in the source code to
            understand the preference of selection.

            
            Regards,

            Ravi

            
                On Wed, Nov 21, 2018 at 9:57 PM
                  Ravishankar N <ravishankar@xxxxxxxxxx>
                  wrote:

                
                   Hi,

                    If there are multiple clients , you can change the
                    'cluster.read-hash-mode' volume option's value to 2.
                    Then different reads should be served from different
                    bricks for different clients. The meaning of various
                    values for 'cluster.read-hash-mode' can be got from
                    `gluster volume set help`. gluster-4.1 also has
                    added a new value[1] to this option. Of course, the
                    assumption is that all bricks host good copies (i.e.
                    there are no self-heals pending).

                    
                    Hope this helps,

                    Ravi

                    
                    [1]  https://review.gluster.org/#/c/glusterfs/+/19698/

                    
                    On
                      11/22/2018 10:20 AM, Anh Vo wrote:

                    
                        Hi,
                          Our setup: We have a distributed
                            replicated setup of 3 replica. The total
                            number of servers varies between clusters,
                            in some cases we have a total of 36 (12 x 3)
                            servers, in some of them we have 12 servers
                            (4 x 3). We're using gluster 3.12.15
                          

                          In all instances what I am noticing is
                            that only one member of the replica is
                            serving read for a particular file, even
                            when all the members of the replica set is
                            online. We have many large input files (for
                            example: 150GB zip file) and when there are
                            50 clients reading from one single server
                            the performance degrades by several
                            magnitude for reading that file only.
                            Shouldn't all members of the replica
                            participate in serving the read requests?
                          

                          Our options
                          

                          cluster.shd-max-threads: 1
                          cluster.heal-timeout: 900
                          network.inode-lru-limit: 50000
                          performance.md-cache-timeout: 600
                          performance.cache-invalidation: on
                          performance.stat-prefetch: on
                          features.cache-invalidation-timeout: 600
                          features.cache-invalidation: on
                          cluster.metadata-self-heal: off
                          cluster.entry-self-heal: off
                          cluster.data-self-heal: off
                          features.inode-quota: off
                          features.quota: off
                          transport.listen-backlog: 100
                          transport.address-family: inet
                          performance.readdir-ahead: on
                          nfs.disable: on
                          performance.strict-o-direct: on
                          network.remote-dio: off
                          server.allow-insecure: on
                          performance.write-behind: off
                          cluster.nufa: disable
                          diagnostics.latency-measurement: on
                          diagnostics.count-fop-hits: on
                          cluster.ensure-durability: off
                          cluster.self-heal-window-size: 32
                          cluster.favorite-child-policy: mtime
                          performance.io-thread-count: 32
                          cluster.eager-lock: off
                          server.outstanding-rpc-limit: 128
                          cluster.rebal-throttle: aggressive
                          server.event-threads: 3
                          client.event-threads: 3
                          performance.cache-size: 6GB
                          cluster.readdir-optimize: on
                          storage.build-pgfid: on
                          

                      _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users
                    
                    
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users