Re: frequent split-brain with Gluster + Samba + Win client

Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> · Thu, 07 Aug 2014 15:31:06 +0530



    On 08/07/2014 03:23 PM, Pranith Kumar
      Karampuri wrote:

    
      On 08/07/2014 03:18 PM, Tiemen Ruiten
        wrote:

      
            Hello Pranith,

              
              Thanks for your reply. I'm using 3.5.2. 

              
              Is it possible that Windows doesn't release the files
              after a write happens? 

              
            Because the self-heal often never occurs. Just this morning
            we discovered that when a web server read from the other
            node, some files that had been changed days ago still had
            content from before the edit.

            
          How can I ensure that everything
            syncs reliably and consistently when mounting from SMB? Is
            Samba VFS more reliable in this respect?

          
      It should happen automatically. Even the mount *must* serve reads
      from good copy. In what scenario did you observe that the reads
      are served from stale brick?

      Could you give 'getfattr -d -m. -e hex
      <path-of-file-on-brick>' output from both the bricks?

    
    Sorry I was not clear here. Please give the output of the above
    commands for the file where you observed 'stale read'

    
    Pranith

     
      Is it possible to provide self-heal-daemon logs so that we can
      inspect what is happening?

      
      Pranith

      
          Tiemen

          
            On 7 August 2014 03:14, Pranith
              Kumar Karampuri <pkarampu@xxxxxxxxxx>
              wrote:

              
                 hi Tiemen,

                  From the logs you have pasted, it doesn't seem there
                  are any split-brains. It is just performing
                  self-heals. What version of glusterfs are you using?
                  Self-heals sometimes don't happen if the data
                  operations from mount are in progress because it tries
                  to give that more priority. Missing files should be
                  created once the self-heal completes on the parent
                  directory of those files.

                  
                  Pranith
                  
                    
                      On 08/07/2014 01:40 AM, Tiemen Ruiten wrote:

                      
                        Sorry, I seem to have messed up the
                          subject. 

                          
                          I should add, I'm mounting these volumes
                          through GlusterFS FUSE, not the Samba VFS
                          plugin.

                           
                          On 06-08-14 21:47, Tiemen Ruiten wrote:

                        
                                Hello,

                                  
                                  I'm running into some serious problems
                                  with Gluster + CTDB and Samba. What I
                                  have:

                                  
                                A two node replicated gluster cluster
                                set up to share volumes using Samba
                                setup according to this guide: https://download.gluster.org/pub/gluster/glusterfs/doc/Gluster_CTDB_setup.v1.pdf

                                
                              When we edit or copy files into the volume
                              via SMB (from a Windows client accessing
                              through a samba file share) this
                              inevitably leads to a split-brain
                              scenario. For example:

                              
                              gluster> volume heal fl-webroot info

                              Brick
                              ankh.int.rdmedia.com:/export/glu/web/flash/webroot/

<gfid:0b162618-e46f-4921-92d0-c0fdb5290bf5>

<gfid:a259de7d-69fc-47bd-90e7-06a33b3e6cc8>

                              Number of entries: 2

                              
                              Brick
                              morpork.int.rdmedia.com:/export/glu/web/flash/webroot/

                              /LandingPage_Saturn_Production/images

                              /LandingPage_Saturn_Production

                              /LandingPage_Saturn_Production/Services/v2

/LandingPage_Saturn_Production/images/country/be

                              /LandingPage_Saturn_Production/bin

                              /LandingPage_Saturn_Production/Services

/LandingPage_Saturn_Production/images/generic

/LandingPage_Saturn_Production/aspnet_client/system_web

/LandingPage_Saturn_Production/images/country

                              /LandingPage_Saturn_Production/Scripts

/LandingPage_Saturn_Production/aspnet_client

/LandingPage_Saturn_Production/images/country/fr

                              Number of entries: 12

                              
                              gluster> volume heal fl-webroot info

                              Brick
                              ankh.int.rdmedia.com:/export/glu/web/flash/webroot/

<gfid:0b162618-e46f-4921-92d0-c0fdb5290bf5>

<gfid:a259de7d-69fc-47bd-90e7-06a33b3e6cc8>

                              Number of entries: 2

                              
                              Brick
                              morpork.int.rdmedia.com:/export/glu/web/flash/webroot/

                              /LandingPage_Saturn_Production/images

                              /LandingPage_Saturn_Production

                              /LandingPage_Saturn_Production/Services/v2

/LandingPage_Saturn_Production/images/country/be

                              /LandingPage_Saturn_Production/bin

                              /LandingPage_Saturn_Production/Services

/LandingPage_Saturn_Production/images/generic

/LandingPage_Saturn_Production/aspnet_client/system_web

/LandingPage_Saturn_Production/images/country

                              /LandingPage_Saturn_Production/Scripts

/LandingPage_Saturn_Production/aspnet_client

/LandingPage_Saturn_Production/images/country/fr

                              
                            Sometimes self-heal works, sometimes it
                              doesn't:

                              
                              [2014-08-06 19:32:17.986790] E
                              [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status]
                              0-fl-webroot-replicate-0:  entry self
                              heal  failed,   on
                              /LandingPage_Saturn_Production/Services/v2

                              [2014-08-06 19:32:18.008330] W
                              [client-rpc-fops.c:2772:client3_3_lookup_cbk]
                              0-fl-webroot-client-0: remote operation
                              failed: No such file or directory. Path:
                              <gfid:a89d7a07-2e3d-41ee-adcc-cb2fba3d2282>
                              (a89d7a07-2e3d-41ee-adcc-cb2fba3d2282)

                              [2014-08-06 19:32:18.024057] I
                              [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status]
                              0-fl-webroot-replicate-0:  gfid or missing
                              entry self heal  is started, metadata self
                              heal  is successfully completed,
                              backgroung data self heal  is successfully
                              completed,  data self heal from
                              fl-webroot-client-1  to sinks 
                              fl-webroot-client-0, with 0 bytes on
                              fl-webroot-client-0, 168 bytes on
                              fl-webroot-client-1,  data - Pending
                              matrix:  [ [ 0 0 ] [ 1 0 ] ]  metadata
                              self heal from source fl-webroot-client-1
                              to fl-webroot-client-0,  metadata -
                              Pending matrix:  [ [ 0 0 ] [ 2 0 ] ], on
                              /LandingPage_Saturn_Production/Services/v2/PartnerApiService.asmx

                              
                            More seriously, some files are
                                simply missing on one of the nodes
                                without any error in the logs or notice
                                when running gluster volume heal $volume
                                info.

                            
                            Of course I can provide any log file
                              necessary.

                            
      _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users
    
    
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users