Re: Hundreds of duplicate files

Joe Julian <joe@xxxxxxxxxxxxxxxx> · Fri, 20 Feb 2015 12:51:13 -0800



    On 02/20/2015 12:21 PM, Olav Peeters
      wrote:

    
      Let's take one file
        (3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd) as an example...

        On the 3 nodes where all bricks are formatted as XFS and mounted
        in /export and 272b2366-dfbf-ad47-2a0f-5d5cc40863e3 is the
        mounting point of a NFS shared storage connection from XenServer
        machines:

      
    Did I just read this correctly? Your bricks are NFS mounts? ie,
    GlusterFS Client <-> GlusterFS Server <-> NFS <->
    XFS

    
        [root@gluster01 ~]# find
        /export/*/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/ -name '300*'
        -exec ls -la {} \;

        -rw-r--r--. 2 root root 44332659200 Feb 17 23:55
/export/brick13gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd

      
    Supposedly, this is the actual file.

    
       -rw-r--r--. 2 root root 0 Feb 18
        00:51
/export/brick14gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd

      
    This is not a linkfile. Note it's mode 0644. How it got there with
    those permissions would be a matter of history and would require
    information that's probably lost.

    
        root@gluster02 ~]# find
        /export/*/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/ -name '300*'
        -exec ls -la {} \;

        -rw-r--r--. 2 root root 44332659200 Feb 17 23:55
/export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd

        
        [root@gluster03 ~]# find
        /export/*/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/ -name '300*'
        -exec ls -la {} \;

        -rw-r--r--. 2 root root 44332659200 Feb 17 23:55
/export/brick13gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd

        -rw-r--r--. 2 root root 0 Feb 18 00:51
/export/brick14gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd

      
    Same analysis as above.

    
        3 files with information, 2 x a 0-bit file with the same name

        
        Checking the 0-bit files:

        [root@gluster01 ~]# getfattr -m . -d -e hex
/export/brick14gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd

        getfattr: Removing leading '/' from absolute path names

        # file:
export/brick14gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd

security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000

        trusted.afr.dirty=0x000000000000000000000000

        trusted.afr.sr_vol01-client-34=0x000000000000000000000000

        trusted.afr.sr_vol01-client-35=0x000000000000000000000000

        trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417

        
        [root@gluster03 ~]# getfattr -m . -d -e hex
/export/brick14gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd

        getfattr: Removing leading '/' from absolute path names

        # file:
export/brick14gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd

security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000

        trusted.afr.dirty=0x000000000000000000000000

        trusted.afr.sr_vol01-client-34=0x000000000000000000000000

        trusted.afr.sr_vol01-client-35=0x000000000000000000000000

        trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417

        
        This is not a glusterfs link file since there is no
        "trusted.glusterfs.dht.linkto", am I correct? 

      
    You are correct.

    
        And checking the "good" files:

        
        # file:
export/brick13gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd

security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000

        trusted.afr.dirty=0x000000000000000000000000

        trusted.afr.sr_vol01-client-32=0x000000000000000000000000

        trusted.afr.sr_vol01-client-33=0x000000000000000000000000

        trusted.afr.sr_vol01-client-34=0x000000000000000000000000

        trusted.afr.sr_vol01-client-35=0x000000010000000100000000

        trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417

        
        [root@gluster02 ~]# getfattr -m . -d -e hex
/export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd

        getfattr: Removing leading '/' from absolute path names

        # file:
export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd

security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000

        trusted.afr.dirty=0x000000000000000000000000

        trusted.afr.sr_vol01-client-32=0x000000000000000000000000

        trusted.afr.sr_vol01-client-33=0x000000000000000000000000

        trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417

        
        [root@gluster03 ~]# getfattr -m . -d -e hex
/export/brick13gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd

        getfattr: Removing leading '/' from absolute path names

        # file:
export/brick13gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd

security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000

        trusted.afr.dirty=0x000000000000000000000000

        trusted.afr.sr_vol01-client-40=0x000000000000000000000000

        trusted.afr.sr_vol01-client-41=0x000000000000000000000000

        trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417

        
        Seen from a client via a glusterfs mount:

        [root@client ~]# ls -al
        /mnt/glusterfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/300*

        -rw-r--r--. 1 root root 0 Feb 18 00:51
/mnt/glusterfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd

        -rw-r--r--. 1 root root 0 Feb 18 00:51
/mnt/glusterfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd

        -rw-r--r--. 1 root root 0 Feb 18 00:51
/mnt/glusterfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd

        
        Via NFS (just after performing a umount and mount the volume
        again):

        [root@client ~]# ls -al
        /mnt/nfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/300*                                    

        
        -rw-r--r--. 1 root root 44332659200 Feb 17 23:55
/mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd

        -rw-r--r--. 1 root root 44332659200 Feb 17 23:55
/mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd

        -rw-r--r--. 1 root root 44332659200 Feb 17 23:55
/mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd

        
        Doing the same list a couple of seconds later:

        [root@client ~]# ls -al
        /mnt/nfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/300*

        -rw-r--r--. 1 root root 0 Feb 18 00:51
/mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd

        -rw-r--r--. 1 root root 0 Feb 18 00:51
/mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd

        -rw-r--r--. 1 root root 0 Feb 18 00:51
/mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd

        And again, and again, and again:

        [root@client ~]# ls -al
        /mnt/nfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/300*

        -rw-r--r--. 1 root root 0 Feb 18 00:51
/mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd

        -rw-r--r--. 1 root root 0 Feb 18 00:51
/mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd

        -rw-r--r--. 1 root root 0 Feb 18 00:51
/mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd

        
        This really seems odd. Why do we get to see "real data file"
        once only?

        
        It seems more and more that this crazy file duplication (and
        writing of sticky bit files) was actually triggered when
        rebooting one of the three nodes while there still is an active
        (even when there is no data exchange at all) NFS connection,
        since all 0-bit files (of the non Sticky bit type) were either
        created at 00:51 or 00:41, the exact moment one of the three
        nodes in the cluster were rebooted. This would mean that
        replication currently with GlusterFS creates hardly any
        redundancy. Quiet the opposite, if one of the machines goes
        down, all of your data seriously gets disorganised. I am buzzy
        configuring a test installation to see how this can be best
        reproduced for a bug report..

        
        Does anyone have a suggestion how to best get rid of the
        duplicates, or rather get this mess organised the way it should
        be?

        This is a cluster with millions of files. A rebalance does not
        fix the issue, neither does a rebalance fix-layout help. Since
        this is a replicated volume all files should be their 2x, not
        3x. Can I safely just remove all the 0 bit files outside of the
        .glusterfs directory including the sticky bit files?

        
        The empty 0 bit files outside of .glusterfs on every brick I can
        probably safely removed like this:

        find /export/* -path */.glusterfs -prune -o -type f -size 0
        -perm 1000 -exec rm {} \;

        not?

        
        Thanks!

        
        Cheers,

        Olav

        
        On 18/02/15 22:10, Olav Peeters wrote:

      
        Thanks Tom and Joe,

          for the fast response!

          
          Before I started my upgrade I stopped all clients using the
          volume and stopped all VM's with VHD on the volume, but I
          guess, and this may be the missing thing to reproduce this in
          a lab, I did not detach a NFS shared storage mount from a
          XenServer pool to this volume, since this is an extremely
          risky business. I also did not stop the volume. This I guess
          was a bit stupid, but since I did upgrades in the past this
          way without any issues I skipped this step (a really bad
          habit). I'll make amends and file a proper bug report :-). I
          agree with you Joe, this should never happen, even when
          someone ignores the advice of stopping the volume. If it would
          also be nessessary to detach shared storage NFS connections to
          a volume, than franky, glusterfs is unusable in a private
          cloud. No one can afford downtime of the whole infrastructure
          just for a glusterfs upgrade. Ideally a replicated gluster
          volume should even be able to remain online and used during
          (at least a minor version) upgrade.

          
          I don't know whether a heal was maybe buzzy when I started the
          upgrade. I forgot to check. I did check the CPU activity on
          the gluster nodes which were very low (in the 0.0X range via
          top), so I doubt it. I will add this to the bug report as a
          suggestion should they not be able to reproduce with an open
          NFS connection.

          
          By the way, is it sufficient to do:

          service glusterd stop

          service glusterfsd stop

          and do a:

          ps aux | gluster*

          to see if everything has stopped and kill any leftovers should
          this be necessary?

          
          For the fix, do you agree that if I run e.g.:

          find /export/* -type f -size 0 -perm 1000 -exec /bin/rm {} \;

          on every node if /export is the location of all my bricks,
          also in a replicated set-up, this will be save?

          No necessary 0bit files will be deleted in e.g. the .glusterfs
          of every brick?

          
          Thanks for your support!

          
          Cheers,

          Olav

          
          On 18/02/15 20:51, Joe Julian wrote:

        
          On 02/18/2015 11:43 AM, tbenzvi@xxxxxxxxxxxxxxx
            wrote:

          
            Hi Olav,
            

              I have a hunch that our problem was caused by improper
              unmounting of the gluster volume, and have since found
              that the proper order should be: kill all jobs using
              volume -> unmount volume on clients -> gluster
              volume stop -> stop gluster service (if necessary)
             
            In my case, I wrote a Python script to find duplicate
              files on the mounted volume, then delete the corresponding
              link files on the bricks (making sure to also delete files
              in the .glusterfs directory)
             
            However, your find command was also suggested to me and
              I think it's a simpler solution. I believe removing all
              link files (even ones that are not causing duplicates) is
              fine since the next file access gluster will do a lookup
              on all bricks and recreate any link files if necessary.
              Hopefully a gluster expert can chime in on this point as
              I'm not completely sure.
          
          
          You are correct.

          
            Keep in mind your setup is somewhat different than mine
              as I have only 5 bricks with no replication.
             
            Regards,
            Tom
             
            ---------


              Original Message ---------
              Subject: Re:  Hundreds of duplicate
                files

                From: "Olav Peeters" <opeeters@xxxxxxxxx>

                Date: 2/18/15 10:52 am

                To: gluster-users@xxxxxxxxxxx,
                tbenzvi@xxxxxxxxxxxxxxx

                
                Hi all,

                  I'm have this problem after upgrading from 3.5.3 to
                  3.6.2.

                  At the moment I am still waiting for a heal to finish
                  (on a 31TB volume with 42 bricks, replicated over
                  three nodes).

                  
                  Tom,

                  how did you remove the duplicates?

                  with 42 bricks I will not be able to do this
                  manually..

                  Did a:

                  find $brick_root -type f -size 0 -perm 1000 -exec
                  /bin/rm {} \;

                  work for you?

                  
                  Should this type of thing ideally not be checked and
                  mended by a heal?

                  
                  Does anyone have an idea yet how this happens in the
                  first place? Can it be connected to upgrading?

                  
                  Cheers,

                  Olav

                   
                  On 01/01/15 03:07, tbenzvi@xxxxxxxxxxxxxxx
                  wrote:
                
                  No, the files can be read on a newly mounted
                    client! I went ahead and deleted all of the link
                    files associated with these duplicates, and then
                    remounted the volume. The problem is fixed!
                  Thanks again for the help, Joe and Vijay.
                   
                  Tom
                   
                  --------- Original Message
                    ---------
                    Subject: Re:  Hundreds of
                      duplicate files

                      From: "Vijay Bellur" <vbellur@xxxxxxxxxx>

                      Date: 12/28/14 3:23 am

                      To: tbenzvi@xxxxxxxxxxxxxxx,
                      gluster-users@xxxxxxxxxxx

                      
                      On 12/28/2014 01:20 PM, tbenzvi@xxxxxxxxxxxxxxx
                      wrote:

                      > Hi Vijay,

                      > Yes the files are still readable from the
                      .glusterfs path.

                      > There is no explicit error. However, trying
                      to read a text file in

                      > python simply gives me null characters:

                      >

                      > >>> open('ott_mf_itab').readlines()

                      >
['\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00']

                      >

                      > And reading binary files does the same

                      >

                      
                      Is this behavior seen with a freshly mounted
                      client too?

                      
                      -Vijay

                      
                      > --------- Original Message ---------

                      > Subject: Re:  Hundreds of
                      duplicate files

                      > From: "Vijay Bellur" <vbellur@xxxxxxxxxx>

                      > Date: 12/27/14 9:57 pm

                      > To: tbenzvi@xxxxxxxxxxxxxxx,
                      gluster-users@xxxxxxxxxxx

                      >

                      > On 12/28/2014 10:13 AM, tbenzvi@xxxxxxxxxxxxxxx
                      wrote:

                      > > Thanks Joe, I've read your blog post as
                      well as your post

                      > regarding the

                      > > .glusterfs directory.

                      > > I found some unneeded duplicate files
                      which were not being read

                      > > properly. I then deleted the link file
                      from the brick. This always

                      > > removes the duplicate file from the
                      listing, but the file does not

                      > > always become readable. If I also delete
                      the associated file in the

                      > > .glusterfs directory on that brick, then
                      some more files become

                      > > readable. However this solution still
                      doesn't work for all files.

                      > > I know the file on the brick is not
                      corrupt as it can be read

                      > directly

                      > > from the brick directory.

                      >

                      > For files that are not readable from the
                      client, can you check if the

                      > file is readable from the .glusterfs/ path?

                      >

                      > What is the specific error that is seen while
                      trying to read one such

                      > file from the client?

                      >

                      > Thanks,

                      > Vijay

                      >

                      >

                      >

                      >
                      _______________________________________________

                      > Gluster-users mailing list

                      > Gluster-users@xxxxxxxxxxx

                      > http://www.gluster.org/mailman/listinfo/gluster-users

                      >
                  
                  
                  _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
                
              
            _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
          
          
          _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
        
        
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users