Re: [slelievre@xxxxxxxxxxxxxxxx: Presentation + [GlusterFS] My first feedback with questions]

Anand Avati <avati@xxxxxxxxxxxxx> · Wed, 25 Apr 2007 03:37:50 -0700

> Hi,
> 
> First, thanks to the Gluster development team to provide such a tool !
> Your goals are exactly what we are looking for !
> 
> Who are "we"? might you say. First, I am Sebastien, a french student who
> is now on a training period with TBS-Internet (that's the "we" ;)).
> TBS-Internet is a certification authority.
> 
> I've found this project because my goal is to re-design the
> TBS-datacenter architecture and give it fault-tolerance and disaster
> recovery.
> 
> One of my goal is then to find an easy way to replicate data across the
> servers so that we will be able to fail over services within a second.
> 
> We were looking for a n-redundant and decentralized file system, and we
> have found you !
> 
> That was for my presentation to the list, now I am going to add my
> feedback about my first tests with glusterfs.
> 
> Here is the test platform :
> - Storage Server 1, 192.168.121.5
> - Storage Server 2, 192.168.121.6
> - Client, 192.168.121.7
> 
> All systems are Ubuntu Server 6.10 (edgy)
> All machines are PIV HT 3GHz, 1GB of RAM each, Ethernet 100 linked (switch)
> 
> At the very beginning, all machines had the stable version 1.2.3 of
> glusterfs.
> 
> Something about the client compilation : I spent some time to understand
> why it did not want to configure, saying, that fuse 2.6.X was not present.
> 
> ubuntu fuse module package was installed (only on the client). The
> version was 2.5.something. I've downloaded the latest stable fuse
> sources from sourceforge, configured them (with the module generation
> enforcement) and installed them (in /usr/local/lib/). I did not think to
> remove the package at that time because I did not want to touch the
> system's integrity (I like to know that I still have the "stable"
> package - +dev packages - of something I compile myself).
> 
> During configuration, glusterfs kept saying that I did not have the
> 2.6.X version of fuse, even though the loaded module and libs were.
> 
> I did not find anyway to specify to the ./configure the way to "my"
> fuse. I've even tried the "--with-lib=/usr/local/lib/", but it did not
> work better.
> 
> I solved this problem by removing distribution fuse packages; But I was
> wondering if there was any way to have multiple fuse version and to
> specify it to the glusterfs compilation (that would be my first question)

the problem you faced is not specific to glusterfs+fuse, but in
general any app depending on a specific version of another library.  

> So, I've managed to install the whole thing and I started some tests.
> I began with a typical clustered file-system in "unify" mode. Client
> mounted the thing instantaneously and so I started to create a file to
> acknowledge it was working. Great ! It works ! I have created a file
> with a little touch toto.tes and this file appeared on the server2
> shared directory.
> 
> I then wanted to modify this server configuration, so when I wanted to
> stop the server... I had to kill him :( Isn't there any other way ? I
> did not find anything about in the User Documentation. Is KILL [PID] the
> (actual) only way to stop a server ? (second question)

glusterfsd recently got support for pidfile. soon we should have an init.d
kind of script to start and stop glusterfsd. 

> Anyway, I wanted to test the afr translator, and it was on the client
> that the configuration should be made. I umounted the share, changed the
> configuration, and remounted it... Errr... afr needs the 1.3.0 at least!
> 
> I wanted to thank you there because only my client is now on 1.3.0;
> Servers are still in 1.2.3 and the system works. That is a really good
> point !
> 
> So, I mounted my share again to see if it worked. It worked ! Size was
> reduce by two (I only asked for a duplicate). However, my little
> toto.tes had disappeared. I looked for it on server2, it was there.
> Quite funny.

your are expected to 'begin' a unify or afr from empty volumes. if you
use an existing share to initiate a unify or afr, they start from an
inconistatnt statte (equivalent to a corrupted volume). for now you
have to manually fix them (by manually copying the missing copy for
afr, or deleting extra copies for unify). 1.4 release will have
self-heal/fsck support which will 'fix' such inconistancies on the
flyaand bring the system back to normal.

> I tried touch toto2.tes, wonderful! toto2.tes appeared on both servers!
> I tried to write blahblahblah into it from the client, and the two
> copies have been modified. That is great.
> 
> And then, I tried something a little more sadistic. Remember that
> toto.tes was invisible in this new share on the client. From it, I tried
> a touch toto.tes, it then created a toto.tes on server1 (which did not
> have any instance of it until now) and server2 still had its previous
> copy. OK, I then tried to write into toto.tes from the client. let's say
> "bliblibli". "bliblibli" appeared in the file on server1 but NOT in file
> in server2. Replication failed here ! Do you know why ? (instead of me
> being so dumb with the system, of course ;)) That will be the third
> question.

expected behavior for now. self-heal would fix this. you were expected
to begin with empty volumes.

> After all this, I tried to re-mount the share in "unify" mode (so dumb I
> am, but hey... They are tests after all !)
> 
> Fact is I've found on my client duplicate entries, with same i-node,
> same filename and same size.
> 
> # ls /mnt/gluster/ -li
> total 16
> 3 -rw-r--r-- 1 root root 14 2007-04-23 17:50 toto2.tes
> 3 -rw-r--r-- 1 root root 14 2007-04-23 17:50 toto2.tes
> 2 -rw-r--r-- 1 root root 12 2007-04-23 17:51 toto.tes
> 2 -rw-r--r-- 1 root root 12 2007-04-23 17:51 toto.tes
> 
> funny is it not ? I you've followed from the beginning, the "server2"'s
> toto.tes is a blank file, but cat toto.tes on client give us bliblibli.
> I presume that the system use the most recent file.

same explaination :)

> And here I am, writing you all these adventures !
> 
> I think you would agree : this is not a really serious manner to make
> tests (and report them). However, It makes me wondering :
> 
> - if there is any way to correctly shutdown a brick.
> - if there is any way to "supervise" the replication.
> - what happens if 2 clients with 2 different config (let's say, one
> "unify", one "afr" - replicate) access to the same shared system.
> 
> I saw that fault-tolerance is your 1.4.0 priority. For now on, I will
> continue my silly little tests. This morning, I've discovered that
> killing the "1st" brick to troubleshoot it makes the whole thing
> inaccessible to the client (until the 1st brick comes up again).
> However, someone on the IRC chan told me you were aware of this SPOF.

yes the fix is ready and is about to be committed.

> I am going to test the other single or multiple node failures scenarios
> right now, and see how the client reacts.
> 
> Your glusterfs is great, and we WILL use it in our out-to-come cluster
> architecture.

ask us for any support :)

avati

> Cheers, and see you on IRC
> 
> Seb / Enkahel
> 
> Sebastien LELIEVRE
> slelievre@xxxxxxxxxxxxxxxx           Services to ISP
> TBS-internet                   http://www.TBS-internet.com/
> 
> Cet email est sécurisé par un certificat. En savoir plus:
> http://www.tbs-certificats.com/email-securise.html
> 
> 
> 
> ----- End forwarded message -----
> 
> -- 
> ultimate_answer_t
> deep_thought (void)
> { 
>   sleep (years2secs (7500000)); 
>   return 42;
> }
> 
> 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxx
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
> 

-- 
ultimate_answer_t
deep_thought (void)
{ 
  sleep (years2secs (7500000)); 
  return 42;
}