Issue with files on glusterfs becoming unreadable.

elbert.lai at eng.admob.com (Elbert Lai) · Thu, 11 Jun 2009 10:33:44 -0700

elbert at host1:~$ dpkg -l|grep glusterfs
ii  glusterfs-client                   
1.3.8-0pre2                          GlusterFS fuse client
ii  glusterfs-server                   
1.3.8-0pre2                          GlusterFS fuse server
ii  libglusterfs0                      
1.3.8-0pre2                          GlusterFS libraries and  
translator modules

I have 2 hosts set up to use AFR with the package versions listed  
above. I have been experiencing an issue where a file that is copied  
to glusterfs is readable/writable for a while, then at some point it  
time, it ceases to be. Trying to access it only retrieves the error  
message, "cannot open `filename' for reading: Input/output error".

Files enter glusterfs either via the "cp" command from a client or via  
"rsync". In the case of cp, the clients are all local and copying  
across a very fast connection. In the case of rsync, the 1 client is  
itself a gluster client. We are testing out a later version of  
gluster, and it rsync's across a vpn.

elbert at host2:~$ dpkg -l|grep glusterfs
ii  glusterfs-client                    2.0.1-1     clustered file- 
system
ii  glusterfs-server                    2.0.1-1     clustered file- 
system
ii  libglusterfs0                       2.0.1-1     GlusterFS  
libraries and translator modules
ii  libglusterfsclient0                 2.0.1-1     GlusterFS client  
library

=========
What causes files to become inaccessible? I read that fstat() had a  
bug in version 1.3.x whereas stat() did not, and that it was being  
worked on. Could this be related?

When a file becomes inaccessible, I have been manually removing the  
file from the mount point, then copying it back in via scp. Then the  
file becomes accessible. Below I've pasted a sample of what I'm seeing.

> elbert at tool3.:hourlogs$ cd myDir
> ls 1244682000.log
> elbert at tool3.:myDir$ ls 1244682000.log
> 1244682000.log
> elbert at tool3.:myDir$ stat 1244682000.log
>   File: `1244682000.log'
>   Size: 40265114  	Blocks: 78744      IO Block: 4096   regular file
> Device: 15h/21d	Inode: 42205749    Links: 1
> Access: (0755/-rwxr-xr-x)  Uid: ( 1003/   elbert)   Gid: ( 6000/      
> ops)
> Access: 2009-06-11 02:25:10.000000000 +0000
> Modify: 2009-06-11 02:26:02.000000000 +0000
> Change: 2009-06-11 02:26:02.000000000 +0000
> elbert at tool3.:myDir$ tail 1244682000.log
> tail: cannot open `1244682000.log' for reading: Input/output error

At this point, I am able to rm the file. Then, if I scp it back in, I  
am able to successfully tail it.

So,

I have observed cases where the files had a Size of 0, and otherwise  
they were in the same state. I'm not totally certain, but it looks  
like if a file gets into this state from rsync, either it gets  
deposited in this state immediately (before I try to read it), or else  
it quickly enters this state. Speaking generally, file sizes tend to  
be several MB up to 150 MB.

Here's my server config:
# Gluster Server configuration /etc/glusterfs/glusterfs-server.vol
# Configured for AFR & Unify features

volume brick
  type storage/posix
  option directory /var/gluster/data/
end-volume

volume brick-ns
  type storage/posix
  option directory /var/gluster/ns/
end-volume

volume server
  type protocol/server
  option transport-type tcp/server
  subvolumes brick brick-ns
  option auth.ip.brick.allow 165.193.245.*,10.11.*
  option auth.ip.brick-ns.allow 165.193.245.*,10.11.*
end-volume

Here's my client config:
# Gluster Client configuration /etc/glusterfs/glusterfs-client.vol
# Configured for AFR & Unify features

volume brick1
  type protocol/client
  option transport-type tcp/client     # for TCP/IP transport
  option remote-host 10.11.16.68    # IP address of the remote brick
  option remote-subvolume brick        # name of the remote volume
end-volume

volume brick2
  type protocol/client
  option transport-type tcp/client
  option remote-host 10.11.16.71
  option remote-subvolume brick
end-volume

volume brick3
  type protocol/client
  option transport-type tcp/client
  option remote-host 10.11.16.69
  option remote-subvolume brick
end-volume

volume brick4
  type protocol/client
  option transport-type tcp/client
  option remote-host 10.11.16.70
  option remote-subvolume brick
end-volume

volume brick5
  type protocol/client
  option transport-type tcp/client
  option remote-host 10.11.16.119
  option remote-subvolume brick
end-volume

volume brick6
  type protocol/client
  option transport-type tcp/client
  option remote-host 10.11.16.120
  option remote-subvolume brick
end-volume

volume brick-ns1
  type protocol/client
  option transport-type tcp/client
  option remote-host 10.11.16.68
  option remote-subvolume brick-ns  # Note the different remote volume  
name.
end-volume

volume brick-ns2
  type protocol/client
  option transport-type tcp/client
  option remote-host 10.11.16.71
  option remote-subvolume brick-ns  # Note the different remote volume  
name.
end-volume

volume afr1
  type cluster/afr
  subvolumes brick1 brick2
end-volume

volume afr2
  type cluster/afr
  subvolumes brick3 brick4
end-volume

volume afr3
  type cluster/afr
  subvolumes brick5 brick6
end-volume

volume afr-ns
  type cluster/afr
  subvolumes brick-ns1 brick-ns2
end-volume

volume unify
  type cluster/unify
  subvolumes afr1 afr2 afr3
  option namespace afr-ns

  # use the ALU scheduler
  option scheduler alu

  # This option makes brick5 to be readonly, where no new files are  
created.
  ##option alu.read-only-subvolumes brick5##

  # Don't create files one a volume with less than 5% free diskspace
  option alu.limits.min-free-disk  10%

  # Don't create files on a volume with more than 10000 files open
  option alu.limits.max-open-files 10000

  # When deciding where to place a file, first look at the disk-usage,  
then at
  # read-usage, write-usage, open files, and finally the disk-speed- 
usage.
  option alu.order disk-usage:read-usage:write-usage:open-files- 
usage:disk-speed-usage

  # Kick in if the discrepancy in disk-usage between volumes is more  
than 2GB
  option alu.disk-usage.entry-threshold 2GB

  # Don't stop writing to the least-used volume until the discrepancy  
is 1988MB
  option alu.disk-usage.exit-threshold  60MB

  # Kick in if the discrepancy in open files is 1024
  option alu.open-files-usage.entry-threshold 1024

  # Don't stop until 992 files have been written the least-used volume
  option alu.open-files-usage.exit-threshold 32

  # Kick in when the read-usage discrepancy is 20%
  option alu.read-usage.entry-threshold 20%

  # Don't stop until the discrepancy has been reduced to 16% (20% - 4%)
  option alu.read-usage.exit-threshold 4%

  # Kick in when the write-usage discrepancy is 20%
  option alu.write-usage.entry-threshold 20%

## Don't stop until the discrepancy has been reduced to 16%
  option alu.write-usage.exit-threshold 4%

  # Refresh the statistics used for decision-making every 10 seconds
  option alu.stat-refresh.interval 10sec

# Refresh the statistics used for decision-making after creating 10  
files
# option alu.stat-refresh.num-file-create 10
end-volume

#writebehind improves write performance a lot
volume writebehind
   type performance/write-behind
   option aggregate-size 131072 # in bytes
   subvolumes unify
end-volume

Has anyone seen this issue before? Any suggestions?

Thanks,
-elb-
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zresearch.com/pipermail/gluster-users/attachments/20090611/da7e3a20/attachment-0001.htm>