Re: gluster source code help

jayakrishnan mm <jayakrishnan.mm@xxxxxxxxx> · Mon, 13 Feb 2017 17:25:08 +0800

Hi Ravi,Thanks .  I have created a simple 2 node replica volume.

root@dhcp-192-168-36-220:/home/user/gluster/rep-brick1#  gluster v info rep-vol

Volume Name: rep-vol
Type: Replicate
Volume ID: c9c9ef39-27e5-44d5-be69-82423c743304
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 192.168.36.220:/home/user/gluster/rep-brick1
Brick2: 192.168.36.220:/home/user/gluster/rep-brick2
Options Reconfigured:
features.inode-quota: off
features.quota: off
performance.readdir-ahead: on

 Killed brick1 process.

root@dhcp-192-168-36-220:/home/user/gluster/rep-brick1# gluster v status rep-vol
Status of volume: rep-vol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 192.168.36.220:/home/user/gluster/rep
-brick1                                     N/A       N/A        N       N/A  
Brick 192.168.36.220:/home/user/gluster/rep
-brick2                                     49211     0          Y       20157
NFS Server on localhost                     N/A       N/A        N       N/A  
Self-heal Daemon on localhost               N/A       N/A        Y       20186

Task Status of Volume rep-vol
------------------------------------------------------------------------------
There are no active volume task

And copying wish.txt  to mount  directory.

From brick2 ,

root@dhcp-192-168-36-220:/home/user/gluster/rep-brick2/.glusterfs# getfattr -d -e hex -m . ../wish.txt 
# file: ../wish.txt
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.rep-vol-client-0=0x000000020000000100000000
trusted.bit-rot.version=0x0200000000000000589ab1410003e910
trusted.gfid=0xe9f3aafb3f844bca8922a00d48abc643

root@dhcp-192-168-36-220:/home/user/gluster/rep-brick2/.glusterfs/indices/xattrop# ll
total 8
drw------- 2 root root 4096 Feb  8 13:50 ./
drw------- 4 root root 4096 Feb  8 13:48 ../
---------- 4 root root    0 Feb  8 13:50 00000000-0000-0000-0000-000000000001
---------- 4 root root    0 Feb  8 13:50 00000000-0000-0000-0000-000000000005
---------- 4 root root    0 Feb  8 13:50 e9f3aafb-3f84-4bca-8922-a00d48abc643
---------- 4 root root    0 Feb  8 13:50 xattrop-b3beb437-cea4-46eb-9eb4-8d83bfa7baa1

 In the above, I can see the gfid of wish.txt (e9f3aafb-3f84-4bca-8922-a00d48abc643) , which need  to be healed.
1. What are  " 00000000-0000-0000-0000-000000000001" and "00000000-0000-0000-0000-000000000005 " ?
(I can understand trusted.afr.rep-vol-client-0  as the changelog of brick1  as seen by brick2--- from  https://github.com/gluster/glusterfs-specs/blob/master/done/Features/afr-v1.md)

2.  I know xattrop-* is a base file. How this is related  to the files which require  healing ? (Assuming more than one file to be healed).
    What does  the numeric part on xattrop-*   ( xattrop-b3beb437-cea4-46eb-9eb4-8d83bfa7baa1) signify?

3. After brick1 is brought to online, the file is healed. Now only xattrop-* remain  under .glusterfs/indices/xattrop.
  But still  there  is  gfid entry in .glusterfs/e9/f3   directory. Is this an  expected  behavior?

On Tue, Feb 7, 2017 at 8:21 PM, Ravishankar N <ravishankar@xxxxxxxxxx> wrote:

    On 02/07/2017 01:32 PM, jayakrishnan mm
      wrote:

          On Mon, Feb 6, 2017 at 6:05 PM,
            Ravishankar N <ravishankar@xxxxxxxxxx>
            wrote:

                    On
                      02/06/2017 03:15 PM, jayakrishnan mm wrote:

                          On Mon, Feb 6, 2017
                            at 2:36 PM, jayakrishnan mm <jayakrishnan.mm@xxxxxxxxx>
                            wrote:

                                      On
                                        Fri, Feb 3, 2017 at 7:58 PM,
                                        Ravishankar N <ravishankar@xxxxxxxxxx>
                                        wrote:

                                              On
                                                02/03/2017 09:14 AM,
                                                jayakrishnan mm wrote:

                                                    On
                                                      Thu, Feb 2, 2017
                                                      at 8:17 PM,
                                                      Ravishankar N <ravishankar@xxxxxxxxxx>
                                                      wrote:

                                                          On
                                                          02/02/2017
                                                          10:46 AM,
                                                          jayakrishnan
                                                          mm wrote:

                                                          Hi

                                                          How  do I
                                                          determine,
                                                          which part of
                                                          the  code is
                                                          run on the
                                                          client, and
                                                          which part of
                                                          the code is
                                                          run on the
                                                          server
                                                          nodes by
                                                          merely looking
                                                          at the the
                                                          glusterfs 
                                                          source code ?
                                                          I knew 
                                                          there are
                                                          client side 
                                                          and server
                                                          side
                                                          translators
                                                          which will run
                                                          on respective
                                                          platforms. I
                                                          am looking at
                                                          part of self
                                                          heal daemon
                                                          source
                                                           (ec/afr)
                                                          which will run
                                                          on the server
                                                          nodes  and 
                                                          the part which
                                                          run on the
                                                          clients.

                                                           The
                                                          self-heal
                                                          daemon that
                                                          runs on the
                                                          server is also
                                                          a client
                                                          process in the
                                                          sense that it
                                                          has client
                                                          side xlators
                                                          like ec or afr
                                                          and 
                                                          protocol/client
                                                          (see the shd
                                                          volfile
                                                          'glustershd-server.vol')
                                                          loaded and
                                                          talks to the
                                                          bricks like a
                                                          normal client
                                                          does.

                                                          The difference
                                                          is that only
                                                          self-heal
                                                          related
                                                          'logic' get
                                                          executed on
                                                          the shd while
                                                          both self-heal
                                                          and I/O
                                                          related logic
                                                          get executed
                                                          from the
                                                          mount. The
                                                          self-heal
                                                          logic resides
                                                          mostly in
                                                          afr-self-heal*.[ch]
                                                          while I/O
                                                          related logic
                                                          is there in
                                                          the other
                                                          files.

                                                          HTH,

                                                          Ravi

                                             Hi JK,

                                                    Dear  Ravi,
                                                      Thanks
                                                        for your kind
                                                        explanation.
                                                      So,
                                                        each server node
                                                        will have a
                                                        separate
                                                        self-heal
                                                        daemon(shd) up
                                                        and running ,
                                                        every time a
                                                        child_up event
                                                        occurs, and this
                                                        will  be an
                                                        index healer. 
                                                      And
                                                        each daemon
                                                         will spawn
                                                         "priv->child_count
                                                        " number of
                                                        threads on each
                                                        server node .
                                                        correct ?

                                             shd is always
                                            running and yes those many
                                            threads are spawned for
                                            index heal when the process
                                            starts.

                                                      1.
                                                        When exactly a
                                                        full healer
                                                        spawns  threads?

                                             Whenever you run
                                            `gluster volume heal volname
                                            full`. See afr_xl_op().
                                            There are some bugs in
                                            launching full heal though.

                                                      2.
                                                        When
                                                        can GF_EVENT_TRANSLATOR_OP
& GF_SHD_OP_HEAL_INDEX happen together (so that index healer spawns
                                                        thread) ?

                                                          similarly when
                                                        can
                                                        GF_EVENT_TRANSLATOR_OP
& GF_SHD_OP_HEAL_FULL  happen ? During replace-brick ?
                                                      Is
                                                        it possible that
                                                        index healer and
                                                        full healer
                                                        spawns threads
                                                        together (so
                                                        that total
                                                        number of
                                                         threads  is
                                                        2*priv->child_count)?

                                             index heal threads
                                            wake up and run once every
                                            10 minutes or whatever the
                                            cluster.heal-timeout is.
                                            They are also run when a
                                            brick comes up like you
                                            said, via afr_notify(). It
                                            is also run when you
                                            manually launch 'gluster
                                            volume heal volname`. Again
                                            see afr_xl_op().

                                                      3.
                                                        In
                                                        /var/lib/glusterd/glustershd/glustershd-server.vol
                                                        , why
                                                         debug/io-stats
                                                         is chosen as
                                                        the top xlator ?

                                             io-stats is
                                            generally loaded as the top
                                            most xlator in all graphs at
                                            the appropriate place for
                                            gathering profile-info, but
                                            for shd, I'm not sure if it
                                            has any specific use other
                                            than acting as a placeholder
                                            as a parent to all replica
                                            xlators.

                            Hi Ravi,

                            The self heal daemon searches   in
                              .glusterfs/indices/xattrop   directory for
                              the files/dirs  to be healed . Who is
                              updating this information , and on what
                              basis ?  

                Please see https://github.com/gluster/glusterfs-specs/blob/master/done/Features/afr-v1.md,
                it is a bit dated (relevant to AFR v1, which is in
                glusterfs 3.5 and older I think) but the concepts are
                similar. The entries are added/removed by the index
                translator during the pre-op/post-op phases of the AFR
                transaction .

            Hi Ravi,

              Went  thru' the document & source code.   I see
              there are options to enable/disable  entry/data/metadata
               change logs. If  "data-change-log" is 1 (by default , it
              is 1), this will  enable data change log, which results
               in __changelog_enabled() to return 1  and  thereafter
              call afr_changelog_pre_op() . Similar logic  for post-op
              also, which occurs just before unlock.
            Is this responsible for creating/deleting  entries
               inside  .glusterfs/indices/xattrop ?

    Yes, index_xattrop() adds the entry during pre-op and removes it
    during post-op if it was successful.

             Currently I can't verify, since the mount point for
              the rep volume hangs  when data-change-log is set to 0.
              (using glusterfs v 3.7.15). Ideally, the entries  should
               not appear  (in the case of  brick failure and a write
              thereafter)  if this option is set to '0', am I correct ?

            Best Regards
            JK

                                  Thanks Ravi, for the explanation.
                                  Regards
                                  JK 

                                        Regards,

                                        Ravi

                                                  Thanks
                                                  Best
                                                      regards 

                                                          Best
                                                          regards
                                                          JK

                                                        _______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-devel