Re: [Gluster-devel] missing files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Is the failure repeatable ? with the same directories ?

It's very weird that the directories appear on the volume when you do an 'ls' on the bricks. Could it be that you only made a single 'ls' on fuse mount which not showed the directory ? Is it possible that this 'ls' triggered a self-heal that repaired the problem, whatever it was, and when you did another 'ls' on the fuse mount after the 'ls' on the bricks, the directories were there ?

The first 'ls' could have healed the files, causing that the following 'ls' on the bricks showed the files as if nothing were damaged. If that's the case, it's possible that there were some disconnections during the copy.

Added Pranith because he knows better replication and self-heal details.

Xavi

On 02/04/2015 07:23 PM, David F. Robinson wrote:
Distributed/replicated

Volume Name: homegfs
Type: Distributed-Replicate
Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071
Status: Started
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs
Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs
Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs
Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs
Brick5: gfsib02a.corvidtec.com:/data/brick01a/homegfs
Brick6: gfsib02b.corvidtec.com:/data/brick01b/homegfs
Brick7: gfsib02a.corvidtec.com:/data/brick02a/homegfs
Brick8: gfsib02b.corvidtec.com:/data/brick02b/homegfs
Options Reconfigured:
performance.io-thread-count: 32
performance.cache-size: 128MB
performance.write-behind-window-size: 128MB
server.allow-insecure: on
network.ping-timeout: 10
storage.owner-gid: 100
geo-replication.indexing: off
geo-replication.ignore-pid-check: on
changelog.changelog: on
changelog.fsync-interval: 3
changelog.rollover-time: 15
server.manage-gids: on


------ Original Message ------
From: "Xavier Hernandez" <xhernandez@xxxxxxxxxx>
To: "David F. Robinson" <david.robinson@xxxxxxxxxxxxx>; "Benjamin
Turner" <bennyturns@xxxxxxxxx>
Cc: "gluster-users@xxxxxxxxxxx" <gluster-users@xxxxxxxxxxx>; "Gluster
Devel" <gluster-devel@xxxxxxxxxxx>
Sent: 2/4/2015 6:03:45 AM
Subject: Re: [Gluster-devel] missing files

On 02/04/2015 01:30 AM, David F. Robinson wrote:
Sorry. Thought about this a little more. I should have been clearer.
The files were on both bricks of the replica, not just one side. So,
both bricks had to have been up... The files/directories just don't show
up on the mount.
I was reading and saw a related bug
(https://bugzilla.redhat.com/show_bug.cgi?id=1159484). I saw it
suggested to run:
         find <mount> -d -exec getfattr -h -n trusted.ec.heal {} \;

This command is specific for a dispersed volume. It won't do anything
(aside from the error you are seeing) on a replicated volume.

I think you are using a replicated volume, right ?

In this case I'm not sure what can be happening. Is your volume a pure
replicated one or a distributed-replicated ? on a pure replicated it
doesn't make sense that some entries do not show in an 'ls' when the
file is in both replicas (at least without any error message in the
logs). On a distributed-replicated it could be caused by some problem
while combining contents of each replica set.

What's the configuration of your volume ?

Xavi


I get a bunch of errors for operation not supported:
[root@gfs02a homegfs]# find wks_backup -d -exec getfattr -h -n
trusted.ec.heal {} \;
find: warning: the -d option is deprecated; please use -depth instead,
because the latter is a POSIX-compliant feature.
wks_backup/homer_backup/backup: trusted.ec.heal: Operation not supported
wks_backup/homer_backup/logs/2014_05_20.log: trusted.ec.heal: Operation
not supported
wks_backup/homer_backup/logs/2014_05_21.log: trusted.ec.heal: Operation
not supported
wks_backup/homer_backup/logs/2014_05_18.log: trusted.ec.heal: Operation
not supported
wks_backup/homer_backup/logs/2014_05_19.log: trusted.ec.heal: Operation
not supported
wks_backup/homer_backup/logs/2014_05_22.log: trusted.ec.heal: Operation
not supported
wks_backup/homer_backup/logs: trusted.ec.heal: Operation not supported
wks_backup/homer_backup: trusted.ec.heal: Operation not supported
------ Original Message ------
From: "Benjamin Turner" <bennyturns@xxxxxxxxx
<mailto:bennyturns@xxxxxxxxx>>
To: "David F. Robinson" <david.robinson@xxxxxxxxxxxxx
<mailto:david.robinson@xxxxxxxxxxxxx>>
Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx
<mailto:gluster-devel@xxxxxxxxxxx>>; "gluster-users@xxxxxxxxxxx"
<gluster-users@xxxxxxxxxxx <mailto:gluster-users@xxxxxxxxxxx>>
Sent: 2/3/2015 7:12:34 PM
Subject: Re: [Gluster-devel] missing files
It sounds to me like the files were only copied to one replica, werent
there for the initial for the initial ls which triggered a self heal,
and were there for the last ls because they were healed. Is there any
chance that one of the replicas was down during the rsync? It could
be that you lost a brick during copy or something like that. To
confirm I would look for disconnects in the brick logs as well as
checking glusterfshd.log to verify the missing files were actually
healed.

-b

On Tue, Feb 3, 2015 at 5:37 PM, David F. Robinson
<david.robinson@xxxxxxxxxxxxx <mailto:david.robinson@xxxxxxxxxxxxx>>
wrote:

    I rsync'd 20-TB over to my gluster system and noticed that I had
    some directories missing even though the rsync completed normally.
    The rsync logs showed that the missing files were transferred.
    I went to the bricks and did an 'ls -al
    /data/brick*/homegfs/dir/*' the files were on the bricks. After I
    did this 'ls', the files then showed up on the FUSE mounts.
    1) Why are the files hidden on the fuse mount?
    2) Why does the ls make them show up on the FUSE mount?
    3) How can I prevent this from happening again?
    Note, I also mounted the gluster volume using NFS and saw the same
    behavior. The files/directories were not shown until I did the
    "ls" on the bricks.
    David
    ===============================
    David F. Robinson, Ph.D.
    President - Corvid Technologies
    704.799.6944 x101 <tel:704.799.6944%20x101> [office]
    704.252.1310 <tel:704.252.1310> [cell]
    704.799.7974 <tel:704.799.7974> [fax]
    David.Robinson@xxxxxxxxxxxxx <mailto:David.Robinson@xxxxxxxxxxxxx>
    http://www.corvidtechnologies.com
<http://www.corvidtechnologies.com/>

    _______________________________________________
    Gluster-devel mailing list
    Gluster-devel@xxxxxxxxxxx <mailto:Gluster-devel@xxxxxxxxxxx>
    http://www.gluster.org/mailman/listinfo/gluster-devel




_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users




[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux