strange behavior with glusterfs 3.2.0

christopher.anderlik at xidras.com (Christopher Anderlik) · Fri, 29 Apr 2011 08:42:38 +0200

hello.

we use glusterfs 3.2.0
2 glusterfs-server with SLES 11.1
and several clients which access the gfs-volumes.

configuration:

info
----
type=2
count=2
status=1
sub_count=2
version=1
transport-type=0
volume-id=05168b54-6a5c-4aa3-91ee-63d16976c6cd
brick-0=10.0.1.xxx:-glusterstorage-macm03
brick-1=10.0.1.xxy:-glusterstorage-macm03

macm03-fuse.vol
---------------
volume macm03-client-0
     type protocol/client
     option remote-host 10.0.1.xxx
     option remote-subvolume /glusterstorage/macm03
     option transport-type tcp
end-volume

volume macm03-client-1
     type protocol/client
     option remote-host 10.0.1.xxy
     option remote-subvolume /glusterstorage/macm03
     option transport-type tcp
end-volume

volume macm03-replicate-0
     type cluster/replicate
     subvolumes macm03-client-0 macm03-client-1
end-volume

volume macm03-write-behind
     type performance/write-behind
     subvolumes macm03-replicate-0
end-volume

volume macm03-read-ahead
     type performance/read-ahead
     subvolumes macm03-write-behind
end-volume

volume macm03-io-cache
     type performance/io-cache
     subvolumes macm03-read-ahead
end-volume

volume macm03-quick-read
     type performance/quick-read
     subvolumes macm03-io-cache
end-volume

volume macm03-stat-prefetch
     type performance/stat-prefetch
     subvolumes macm03-quick-read
end-volume

volume macm03
     type debug/io-stats
     subvolumes macm03-stat-prefetch
end-volume

macm03.10.0.1.xxx.glusterstorage-macm03.vol
-------------------------------------------
volume macm03-posix
     type storage/posix
     option directory /glusterstorage/macm03
end-volume

volume macm03-access-control
     type features/access-control
     subvolumes macm03-posix
end-volume

volume macm03-locks
     type features/locks
     subvolumes macm03-access-control
end-volume

volume macm03-io-threads
     type performance/io-threads
     subvolumes macm03-locks
end-volume

volume /glusterstorage/macm03
     type debug/io-stats
     subvolumes macm03-io-threads
end-volume

volume macm03-server
     type protocol/server
     option transport-type tcp
     option auth.addr./glusterstorage/macm03.allow *
     subvolumes /glusterstorage/macm03
end-volume

macm03.10.0.1.xxy.glusterstorage-macm03.vol
-------------------------------------------
volume macm03-posix
     type storage/posix
     option directory /glusterstorage/macm03
end-volume

volume macm03-access-control
     type features/access-control
     subvolumes macm03-posix
end-volume

volume macm03-locks
     type features/locks
     subvolumes macm03-access-control
end-volume

volume macm03-io-threads
     type performance/io-threads
     subvolumes macm03-locks
end-volume

volume /glusterstorage/macm03
     type debug/io-stats
     subvolumes macm03-io-threads
end-volume

volume macm03-server
     type protocol/server
     option transport-type tcp
     option auth.addr./glusterstorage/macm03.allow *
     subvolumes /glusterstorage/macm03
end-volume

client
------
the client has mounted the volume via fstab like this:
server:/macm03 /srv/www/GFS glusterfs defaults,_netdev 0 0

now we registered strange behavior and i have some questions:

1) files with size 0
we find many files with size 0. in server-log we only find this. what does this mean?
(most all files in one directory has size 0).

[2011-04-28 23:52:00.630869] I [server-resolve.c:580:server_resolve] 0-macm03-server: pure path 
resolution for /xxx/preview/4aa76fa541413.jpg (LOOKUP)
[2011-04-28 23:52:00.637384] I [server-resolve.c:580:server_resolve] 0-macm03-server: pure path 
resolution for /xxx/preview/4aa76fa541413.jpg (UNLINK)
[2011-04-28 23:52:00.693183] I [server-resolve.c:580:server_resolve] 0-macm03-server: pure path 
resolution for /xxx/preview/4aa76fa541413.jpg (LOOKUP)
[2011-04-28 23:52:00.711092] I [server-resolve.c:580:server_resolve] 0-macm03-server: pure path 
resolution for /xxx/preview/4aa76fa541413.jpg (MKNOD)
[2011-04-28 23:52:00.746289] I [server-resolve.c:580:server_resolve] 0-macm03-server: pure path 
resolution for /xxx/preview/4aa76fa541413.jpg (SETATTR)
[2011-04-28 23:52:16.373532] I [server-resolve.c:580:server_resolve] 0-macm03-server: pure path 
resolution for /xxx/preview/4aa76fa541413.jpg (LOOKUP)

2) then the client is selfhealing meta-data all the time.... (because the file has has size 0 on one 
of the servers???)
but we triggered selfhealing severalt time like this:
http://europe.gluster.org/community/documentation/index.php/Gluster_3.1:_Triggering_Self-Heal_on_Replicate

[2011-04-29 07:55:27.188743] I [afr-common.c:581:afr_lookup_collect_xattr] 0-macm03-replicate-0: 
data self-heal is pending for /videos12/29640/preview/4aadf4b757de6.jpg.
[2011-04-29 07:55:27.188829] I [afr-common.c:735:afr_lookup_done] 0-macm03-replicate-0: background 
meta-data data self-heal triggered. path: /videos12/29640/preview/4aadf4b757de6.jpg
[2011-04-29 07:55:27.194446] W [dict.c:437:dict_ref] 
(-->/opt/glusterfs/3.2.0/lib64/glusterfs/3.2.0/xlator/protocol/client.so(client3_1_fstat_cbk+0x2bb) 
[0x2aaaaafe833b] 
(-->/opt/glusterfs/3.2.0/lib64/glusterfs/3.2.0/xlator/cluster/replicate.so(afr_sh_data_fstat_cbk+0x17d) 
[0x2aaaab11c9ad] 
(-->/opt/glusterfs/3.2.0/lib64/glusterfs/3.2.0/xlator/cluster/replicate.so(afr_sh_data_fix+0x1fc) 
[0x2aaaab11c64c]))) 0-dict: dict is NULL

3)
on some of the clients we then can not access the whole directory:

# dir xxx/preview/
/bin/ls: reading directory xxx/preview/: File descriptor in bad state
total 0

in logs we find this:

[2011-04-29 08:36:17.224301] W [afr-common.c:634:afr_lookup_self_heal_check] 0-macm03-replicate-0: 
/videos12/30181: gfid different on subvolume
[2011-04-29 08:36:17.241330] I [afr-common.c:680:afr_lookup_done] 0-macm03-replicate-0: entries are 
missing in lookup of /xxx/preview.
[2011-04-29 08:36:17.241373] I [afr-common.c:735:afr_lookup_done] 0-macm03-replicate-0: background 
meta-data data entry self-heal triggered. path: /xxx/preview
[2011-04-29 08:36:17.243160] I [afr-self-heal-metadata.c:595:afr_sh_metadata_lookup_cbk] 
0-macm03-replicate-0: path /videos12/30181/preview on subvolume macm03-client-0 => -1 (No such file 
or directory)
[2011-04-29 08:36:17.302228] I [afr-dir-read.c:120:afr_examine_dir_readdir_cbk] 
0-macm03-replicate-0: /videos12/30181/preview: failed to do opendir on macm03-client-0
[2011-04-29 08:36:17.303836] I [afr-dir-read.c:174:afr_examine_dir_readdir_cbk] 
0-macm03-replicate-0:  entry self-heal triggered. path: /xxx/preview, reason: checksums of directory 
differ, forced merge option set

4)
sometimes when we umount glusterfs-volume on client and remount it again, we can access the dirctory 
which was in bad state before -> and then also selfhealing works at it should
but sometimes also a remount does not work.

any help would be appreciated.
thank you very very much!

thx
christopher