Hi Marco, What is the backend filesystem you are using? Does it support extended attributes? you can check the presence of exten ded attribute support using setfattr/getfattr commands. Also the transport-timeout value is too low. Can you test again with transport-timeout set to default value (42s)? regards, On Thu, Aug 7, 2008 at 9:19 PM, Marco Trevisan <marco.trevisan at cardinis.com>wrote: > Hi all, > > I'm in the process of evaluating GlusterFS as a clustered file system, I > like it very much because -among the other cool features- it's very easy > to configure and it allows me to reuse the filesystems I already know as > storage backends. > > Before trying it on expensive hardware, I decided to try it on a very > low HW configuration: > > - 2 old PCs (one P4 class CPU, IDE drives, one 100 Mbps ethernet card) > and a 100 Mbps switch. > > The OS is Debian 'lenny' in both nodes. 'Lenny' comes with FUSE v2.7.2. > > I then compiled glusterfs 1.3.10 on both nodes and setup a server-side, > single-process AFR configuration (the file content is reported below). > I did NOT use the glusterfs-patched FUSE library. > > On top of that I've put some VMWare Server virtual machines. Each > virtual machine image is split into a few 2 GB "vmdk" files (not > preallocated). > > I was successful in starting up and running my virtual machines (with > only an additional line in their configuration files), so I was very > happy with it. > > The problem now is, after putting a virtual machine under "intense" > I/O, when I rebooted it today I found its root filesystem (=the vmdk) > was corrupted. It lost some important directories (e.g. kernel modules > directory under /lib/modules). > > Just to give you a little more detail of the behaviour under I/O, when > the virtual machine is doing I/O to the VMDK file, iptraf shows the > corresponding traffic on the ethernet link at about 15-50 Mbps, so it > looks like only the modified portions of the file are being sent to the > other AFR node, infact if I simulate a failure by powering off the other > AFR node, at reboot I see 90 Mbps (link saturation) traffic as I try to > open the VMDK file, and that operation blocks until full synchronization > has finished. > > The glusterfs.log content is as follows: > [...] > 2008-08-06 12:20:24 E [posix-locks.c:1148:pl_lk] gfs-ds-locks: returning > EAGAIN > 2008-08-06 12:20:24 E [afr.c:3190:afr_lk_cbk] gfs-ds-afr: > (path=/logs-127.0.1.1/.terracotta-logging.lock child=gfs-ds-locks) > op_ret=-1 op_errno=11 > 2008-08-06 12:42:25 E [posix-locks.c:1148:pl_lk] gfs-ds-locks: returning > EAGAIN > 2008-08-06 12:42:25 E [afr.c:3190:afr_lk_cbk] gfs-ds-afr: > (path=/logs-127.0.1.1/.terracotta-logging.lock child=gfs-ds-locks) > op_ret=-1 op_errno=11 > 2008-08-07 11:42:17 E [posix-locks.c:1180:pl_forget] gfs-ds-locks: > Active locks found! > > > The above log does not seem to justify such file corruption... there is > nothing related to "vmdk" files. > > Any suggestions? > Is the HW configuration way too slow for afr to work reliably? > Are there mistakes in the configuration file? > > Any help is really appreciated. > > Kind regards, > Marco > > > ----------GlusterFS config file ---------------- > > # dataspace on storage1 > volume gfs-ds > type storage/posix > option directory /mnt/hda7/gfs-ds > end-volume > > # posix locks > volume gfs-ds-locks > type features/posix-locks > subvolumes gfs-ds > end-volume > > volume gfs-ds-threads > type performance/io-threads > option thread-count 1 > option cache-size 32MB > subvolumes gfs-ds-locks > end-volume > > volume server > type protocol/server > option transport-type tcp/server > subvolumes gfs-ds-threads > # storage network access only > option auth.ip.gfs-ds-threads.allow * > option auth.ip.gfs-ds-afr.allow * > end-volume > > > # dataspace on storage2 > volume gfs-storage2-ds > type protocol/client > option transport-type tcp/client > option remote-host <the other node's IP> # storage network > option remote-subvolume gfs-ds-threads > option transport-timeout 10 # value in seconds; it should be > set relatively low > end-volume > > # automatic file replication translator for dataspace > volume gfs-ds-afr > type cluster/afr > subvolumes gfs-ds-locks gfs-storage2-ds # local and remote > dataspaces > end-volume > > volume writebehind > type performance/write-behind > option aggregate-size 128kB > subvolumes gfs-ds-afr > end-volume > > volume readahead > type performance/read-ahead > option page-size 64kB > option page-count 16 > subvolumes writebehind > end-volume > ------------- > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users > -- Raghavendra G A centipede was happy quite, until a toad in fun, Said, "Prey, which leg comes after which?", This raised his doubts to such a pitch, He fell flat into the ditch, Not knowing how to run. -Anonymous -------------- next part -------------- An HTML attachment was scrubbed... URL: http://zresearch.com/pipermail/gluster-users/attachments/20080812/7b6467f0/attachment.htm