Strange Problem

anlamarama at gmail.com (Serdar Sahin) · Thu, 28 Oct 2010 00:19:06 +1100

Hi,

I am having problems with glusterfs. I have been using glusterfs on another
cluster for last one year and have not encountered any problems so far. This
is a new setup with updated binaries, so it might be a bug or a
configuration I forgot to add.

This is an output from glusterfs client;

[root at db example]# ls -la /home/data/web/f/f2/f3/4a1/
total 77492
drwxr-xr-x 2 root root     4096 Oct 25 15:27 .
drwxr-xr-x 3 root root     4096 Oct 25 15:20 ..
-rw-r--r-- 1 root root 20306615 Oct 25 15:27 14d-95.hts
-rw-r--r-- 1 root root 38611533 Oct 25 15:20 14d-95.pdf
-rwxr-xr-x 1 root root 20306613 Oct 25 15:27 14d-95.swf
?--------- ? ?    ?           ?            ? 14d-95.txt

So, all the clients can not read the file "14d-95.txt". There are many files
like this. On the same client, I get

[2010-10-27 08:12:23.669044] E [client3_1-fops.c:1867:client3_1_lookup_cbk]
: error
[2010-10-27 08:12:23.669113] E [client3_1-fops.c:1867:client3_1_lookup_cbk]
: error
[2010-10-27 08:12:23.669140] W [fuse-bridge.c:180:fuse_entry_cbk]
glusterfs-fuse: 413: LOOKUP() /web/f/f2/f3/4a1/14d-95.txt => -1 (Invalid
argument)

After executing the ls command.

On the server I get

[2010-10-27 08:13:54.188205] E [server.c:67:gfs_serialize_reply] : Failed to
encode message

I have two servers replicated, they both have the correct file.

My configuration for client;

### Add client feature and attach to remote subvolume of server1
volume brick1
 type protocol/client
 option transport-type tcp
 option remote-host st1      # IP address of the remote brick
 option remote-port 24017
 option transport.socket.nodelay on        # undocumented option for speed
 option remote-subvolume brick1        # name of the remote volume
end-volume

### Add client feature and attach to remote subvolume of server2
volume brick2
 type protocol/client
 option transport-type tcp
 option remote-host st2      # IP address of the remote brick
 option remote-port 24017
 option transport.socket.nodelay on        # undocumented option for speed
 option remote-subvolume brick2        # name of the remote volume
end-volume

volume replicated
 type cluster/replicate
 subvolumes brick1 brick2
end-volume

volume readahead
  type performance/read-ahead
  option page-count 4
  option force-atime-update off
  subvolumes replicated
end-volume
volume writebehind
  type performance/write-behind
  option window-size 1MB   subvolumes readahead
end-volume
volume cache
  type performance/io-cache
  option cache-size 512MB
  subvolumes writebehind
end-volume
volume quickread
  type performance/quick-read
  option cache-timeout 1
  option max-file-size 512KB
  subvolumes cache
end-volume

And my configuration for server;

# **** server1 spec file ****
volume posix1
  type storage/posix                    # POSIX FS translator
  option directory /disk1        # Export this directory
end-volume

volume locks1
 type features/locks
 subvolumes posix1
end-volume

### Add POSIX record locking support to the storage brick
volume brick1
  type performance/io-threads
  option thread-count 8
  subvolumes locks1
end-volume

### Add network serving capability to above brick.
volume server
  type protocol/server
  option transport-type tcp                  # For TCP/IP transport
  option bind-address 10.28.18.198
  option transport.socket.listen-port 24017
  subvolumes brick1
  option auth.addr.brick1.allow 10.28.18.* # Allow access to "brick" volume
end-volume

What could be wrong, how can I fix this problem. This error is delaying our
project currently, could you help me out?

Thanks,

Serdar Sahin