[slelievre@xxxxxxxxxxxxxxxx: Presentation + [GlusterFS] My first feedback with questions]

Anand Avati <avati@xxxxxxxxxxxxx> · Wed, 25 Apr 2007 03:26:23 -0700

----- Forwarded message from Sebastien LELIEVRE <slelievre@xxxxxxxxxxxxxxxx> -----

X-Spam-Checker-Version: SpamAssassin 3.1.7-deb (2006-10-05) on 
	mail.zresearch.com
X-Spam-Level: 
X-Spam-Status: No, score=-2.6 required=2.0 tests=BAYES_00 autolearn=ham 
	version=3.1.7-deb
Envelope-to: avati@xxxxxxxxxxxxx
Delivery-date: Wed, 25 Apr 2007 02:18:38 -0700
From: Sebastien LELIEVRE <slelievre@xxxxxxxxxxxxxxxx>
To: avati@xxxxxxxxxxxxx
Subject: Presentation + [GlusterFS] My first feedback with questions

Hi,

First, thanks to the Gluster development team to provide such a tool !
Your goals are exactly what we are looking for !

Who are "we"? might you say. First, I am Sebastien, a french student who
is now on a training period with TBS-Internet (that's the "we" ;)).
TBS-Internet is a certification authority.

I've found this project because my goal is to re-design the
TBS-datacenter architecture and give it fault-tolerance and disaster
recovery.

One of my goal is then to find an easy way to replicate data across the
servers so that we will be able to fail over services within a second.

We were looking for a n-redundant and decentralized file system, and we
have found you !

That was for my presentation to the list, now I am going to add my
feedback about my first tests with glusterfs.

Here is the test platform :
- Storage Server 1, 192.168.121.5
- Storage Server 2, 192.168.121.6
- Client, 192.168.121.7

All systems are Ubuntu Server 6.10 (edgy)
All machines are PIV HT 3GHz, 1GB of RAM each, Ethernet 100 linked (switch)

At the very beginning, all machines had the stable version 1.2.3 of
glusterfs.

Something about the client compilation : I spent some time to understand
why it did not want to configure, saying, that fuse 2.6.X was not present.

ubuntu fuse module package was installed (only on the client). The
version was 2.5.something. I've downloaded the latest stable fuse
sources from sourceforge, configured them (with the module generation
enforcement) and installed them (in /usr/local/lib/). I did not think to
remove the package at that time because I did not want to touch the
system's integrity (I like to know that I still have the "stable"
package - +dev packages - of something I compile myself).

During configuration, glusterfs kept saying that I did not have the
2.6.X version of fuse, even though the loaded module and libs were.

I did not find anyway to specify to the ./configure the way to "my"
fuse. I've even tried the "--with-lib=/usr/local/lib/", but it did not
work better.

I solved this problem by removing distribution fuse packages; But I was
wondering if there was any way to have multiple fuse version and to
specify it to the glusterfs compilation (that would be my first question)

So, I've managed to install the whole thing and I started some tests.
I began with a typical clustered file-system in "unify" mode. Client
mounted the thing instantaneously and so I started to create a file to
acknowledge it was working. Great ! It works ! I have created a file
with a little touch toto.tes and this file appeared on the server2
shared directory.

I then wanted to modify this server configuration, so when I wanted to
stop the server... I had to kill him :( Isn't there any other way ? I
did not find anything about in the User Documentation. Is KILL [PID] the
(actual) only way to stop a server ? (second question)

Anyway, I wanted to test the afr translator, and it was on the client
that the configuration should be made. I umounted the share, changed the
configuration, and remounted it... Errr... afr needs the 1.3.0 at least!

I wanted to thank you there because only my client is now on 1.3.0;
Servers are still in 1.2.3 and the system works. That is a really good
point !

So, I mounted my share again to see if it worked. It worked ! Size was
reduce by two (I only asked for a duplicate). However, my little
toto.tes had disappeared. I looked for it on server2, it was there.
Quite funny.

I tried touch toto2.tes, wonderful! toto2.tes appeared on both servers!
I tried to write blahblahblah into it from the client, and the two
copies have been modified. That is great.

And then, I tried something a little more sadistic. Remember that
toto.tes was invisible in this new share on the client. From it, I tried
a touch toto.tes, it then created a toto.tes on server1 (which did not
have any instance of it until now) and server2 still had its previous
copy. OK, I then tried to write into toto.tes from the client. let's say
"bliblibli". "bliblibli" appeared in the file on server1 but NOT in file
in server2. Replication failed here ! Do you know why ? (instead of me
being so dumb with the system, of course ;)) That will be the third
question.

After all this, I tried to re-mount the share in "unify" mode (so dumb I
am, but hey... They are tests after all !)

Fact is I've found on my client duplicate entries, with same i-node,
same filename and same size.

# ls /mnt/gluster/ -li
total 16
3 -rw-r--r-- 1 root root 14 2007-04-23 17:50 toto2.tes
3 -rw-r--r-- 1 root root 14 2007-04-23 17:50 toto2.tes
2 -rw-r--r-- 1 root root 12 2007-04-23 17:51 toto.tes
2 -rw-r--r-- 1 root root 12 2007-04-23 17:51 toto.tes

funny is it not ? I you've followed from the beginning, the "server2"'s
toto.tes is a blank file, but cat toto.tes on client give us bliblibli.
I presume that the system use the most recent file.

And here I am, writing you all these adventures !

I think you would agree : this is not a really serious manner to make
tests (and report them). However, It makes me wondering :

- if there is any way to correctly shutdown a brick.
- if there is any way to "supervise" the replication.
- what happens if 2 clients with 2 different config (let's say, one
"unify", one "afr" - replicate) access to the same shared system.

I saw that fault-tolerance is your 1.4.0 priority. For now on, I will
continue my silly little tests. This morning, I've discovered that
killing the "1st" brick to troubleshoot it makes the whole thing
inaccessible to the client (until the 1st brick comes up again).
However, someone on the IRC chan told me you were aware of this SPOF.

I am going to test the other single or multiple node failures scenarios
right now, and see how the client reacts.

Your glusterfs is great, and we WILL use it in our out-to-come cluster
architecture.

Cheers, and see you on IRC

Seb / Enkahel

Sebastien LELIEVRE
slelievre@xxxxxxxxxxxxxxxx           Services to ISP
TBS-internet                   http://www.TBS-internet.com/

Cet email est sécurisé par un certificat. En savoir plus:
http://www.tbs-certificats.com/email-securise.html

----- End forwarded message -----

-- 
ultimate_answer_t
deep_thought (void)
{ 
  sleep (years2secs (7500000)); 
  return 42;
}