glusterfs version3 and xen images

Roland Fischer <roland.fischer@xxxxxxxxxx> · Thu, 10 Dec 2009 17:02:01 +0100

Hallo,

i have tried the new version 3 of glusterfs with xen.

we have 2 gfsserver and 2 xen server. we use client-side-replication 
with domUs.

i need to know if our configuration is good to use with xen domUs:

server:
# export-domU-images-server_repl
# gfs-01-01 /GFS/domU-images
# gfs-01-02 /GFS/domU-images

volume posix
type storage/posix
option directory /GFS/domU-images
end-volume

volume locks
type features/locks
subvolumes posix
end-volume

volume domU-images
type performance/io-threads
option thread-count 16 # default is 16
subvolumes locks
end-volume

volume server
type protocol/server
option transport-type tcp
option auth.addr.domU-images.allow 192.168.11.*,127.0.0.1
option transport.socket.listen-port 6997
subvolumes domU-images
end-volume

and client:
volume gfs-01-01
type protocol/client
option transport-type tcp
option remote-host gfs-01-01
option transport.socket.nodelay on
option remote-port 6997
option remote-subvolume domU-images
option ping-timeout 7
end-volume

volume gfs-01-02
type protocol/client
option transport-type tcp
option remote-host gfs-01-02
option transport.socket.nodelay on
option remote-port 6997
option remote-subvolume domU-images
option ping-timeout 7
end-volume

volume gfs-replicate
type cluster/replicate
subvolumes gfs-01-01 gfs-01-02
end-volume

volume writebehind
type performance/write-behind
option cache-size 16MB
subvolumes gfs-replicate
end-volume

volume readahead
type performance/read-ahead
option page-count 16 # cache per file = (page-count x page-size)
subvolumes writebehind
end-volume

volume iocache
type performance/io-cache
option cache-size 1GB
option cache-timeout 1
subvolumes readahead
end-volume

i start a domU and simulate an crash on gfs-01-02 (rcnetwork stop; sleep 
150; rcnetwork start). domU runs further without any problems.

Client Log:
[2009-12-10 16:34:16] E 
[client-protocol.c:415:client_ping_timer_expired] gfs-01-02: Server 
xxx.xxx.xxx.xxx:6997 has not responded in the last 7 seconds, disconnecting.
[2009-12-10 16:34:16] E [saved-frames.c:165:saved_frames_unwind] 
gfs-01-02: forced unwinding frame type(1) op(GETXATTR)
[2009-12-10 16:34:16] E [saved-frames.c:165:saved_frames_unwind] 
gfs-01-02: forced unwinding frame type(2) op(PING)
[2009-12-10 16:34:16] N [client-protocol.c:6972:notify] gfs-01-02: 
disconnected
[2009-12-10 16:34:38] E [socket.c:760:socket_connect_finish] gfs-01-02: 
connection to xxx.xxx.xxx.xxx:6997 failed (No route to host)
[2009-12-10 16:34:38] E [socket.c:760:socket_connect_finish] gfs-01-02: 
connection to xxx.xxx.xxx.xxx:6997 failed (No route to host)

network on gfs-01-02 started

Client Log:
[2009-12-10 16:35:15] N [client-protocol.c:6224:client_setvolume_cbk] 
gfs-01-02: Connected to xxx.xxx.xxx.xxx:6997, attached to remote volume 
'domU-images'.
[2009-12-10 16:35:18] N [client-protocol.c:6224:client_setvolume_cbk] 
gfs-01-02: Connected to xxx.xxx.xxx.xxx:6997, attached to remote volume 
'domU-images'.
[2009-12-10 16:35:20] E 
[afr-self-heal-common.c:1186:sh_missing_entries_create] gfs-replicate: 
no missing files - /vm_disks/virt-template. proceeding to metadata check

everything looks good - sync is started from gfs-01-01 to gfs-01-02 but 
the whole image were transfered.

if we do a ls -la in domU while the transfer the prompt in domU 
disappears. After the sync is ready the prompt in domU appears and we 
can work further.

my question: is this a normal behavoir? in 
http://ftp.gluster.com/pub/gluster/glusterfs/3.0/LATEST/GlusterFS-3.0.0-Release-Notes.pdf 
we have read:

2.1) Choice of self-heal algorithms

During self-heal of file contents, GlusterFS will now dynamically choose 
between two
algorithms based on file size:

a) "Full" algorithm – this algorithm copies the entire file data in 
order to heal the out-ofsync
copy. This algorithm is used when a file has to be created from scratch on a
server.

b) "Diff" algorithm – this algorithm compares blocks present on both 
servers and copies
only those blocks that are different from the correct copy to the 
out-of-sync copy. This
algorithm is used when files have to be re-built partially.

The “Diff” algorithm is especially beneficial for situations such as 
running VM images,
where self-heal of a recovering replicated copy of the image will occur 
much faster because
only the changed blocks need to be synchronized.

can we change the self-heal algorithmus in config file?

Thank you very much

Roland Fischer