> Hi, > > First, thanks to the Gluster development team to provide such a tool ! > Your goals are exactly what we are looking for ! > > Who are "we"? might you say. First, I am Sebastien, a french student who > is now on a training period with TBS-Internet (that's the "we" ;)). > TBS-Internet is a certification authority. > > I've found this project because my goal is to re-design the > TBS-datacenter architecture and give it fault-tolerance and disaster > recovery. > > One of my goal is then to find an easy way to replicate data across the > servers so that we will be able to fail over services within a second. > > We were looking for a n-redundant and decentralized file system, and we > have found you ! > > That was for my presentation to the list, now I am going to add my > feedback about my first tests with glusterfs. > > Here is the test platform : > - Storage Server 1, 192.168.121.5 > - Storage Server 2, 192.168.121.6 > - Client, 192.168.121.7 > > All systems are Ubuntu Server 6.10 (edgy) > All machines are PIV HT 3GHz, 1GB of RAM each, Ethernet 100 linked (switch) > > At the very beginning, all machines had the stable version 1.2.3 of > glusterfs. > > Something about the client compilation : I spent some time to understand > why it did not want to configure, saying, that fuse 2.6.X was not present. > > ubuntu fuse module package was installed (only on the client). The > version was 2.5.something. I've downloaded the latest stable fuse > sources from sourceforge, configured them (with the module generation > enforcement) and installed them (in /usr/local/lib/). I did not think to > remove the package at that time because I did not want to touch the > system's integrity (I like to know that I still have the "stable" > package - +dev packages - of something I compile myself). > > During configuration, glusterfs kept saying that I did not have the > 2.6.X version of fuse, even though the loaded module and libs were. > > I did not find anyway to specify to the ./configure the way to "my" > fuse. I've even tried the "--with-lib=/usr/local/lib/", but it did not > work better. > > I solved this problem by removing distribution fuse packages; But I was > wondering if there was any way to have multiple fuse version and to > specify it to the glusterfs compilation (that would be my first question) the problem you faced is not specific to glusterfs+fuse, but in general any app depending on a specific version of another library. > So, I've managed to install the whole thing and I started some tests. > I began with a typical clustered file-system in "unify" mode. Client > mounted the thing instantaneously and so I started to create a file to > acknowledge it was working. Great ! It works ! I have created a file > with a little touch toto.tes and this file appeared on the server2 > shared directory. > > I then wanted to modify this server configuration, so when I wanted to > stop the server... I had to kill him :( Isn't there any other way ? I > did not find anything about in the User Documentation. Is KILL [PID] the > (actual) only way to stop a server ? (second question) glusterfsd recently got support for pidfile. soon we should have an init.d kind of script to start and stop glusterfsd. > Anyway, I wanted to test the afr translator, and it was on the client > that the configuration should be made. I umounted the share, changed the > configuration, and remounted it... Errr... afr needs the 1.3.0 at least! > > I wanted to thank you there because only my client is now on 1.3.0; > Servers are still in 1.2.3 and the system works. That is a really good > point ! > > So, I mounted my share again to see if it worked. It worked ! Size was > reduce by two (I only asked for a duplicate). However, my little > toto.tes had disappeared. I looked for it on server2, it was there. > Quite funny. your are expected to 'begin' a unify or afr from empty volumes. if you use an existing share to initiate a unify or afr, they start from an inconistatnt statte (equivalent to a corrupted volume). for now you have to manually fix them (by manually copying the missing copy for afr, or deleting extra copies for unify). 1.4 release will have self-heal/fsck support which will 'fix' such inconistancies on the flyaand bring the system back to normal. > I tried touch toto2.tes, wonderful! toto2.tes appeared on both servers! > I tried to write blahblahblah into it from the client, and the two > copies have been modified. That is great. > > And then, I tried something a little more sadistic. Remember that > toto.tes was invisible in this new share on the client. From it, I tried > a touch toto.tes, it then created a toto.tes on server1 (which did not > have any instance of it until now) and server2 still had its previous > copy. OK, I then tried to write into toto.tes from the client. let's say > "bliblibli". "bliblibli" appeared in the file on server1 but NOT in file > in server2. Replication failed here ! Do you know why ? (instead of me > being so dumb with the system, of course ;)) That will be the third > question. expected behavior for now. self-heal would fix this. you were expected to begin with empty volumes. > After all this, I tried to re-mount the share in "unify" mode (so dumb I > am, but hey... They are tests after all !) > > Fact is I've found on my client duplicate entries, with same i-node, > same filename and same size. > > # ls /mnt/gluster/ -li > total 16 > 3 -rw-r--r-- 1 root root 14 2007-04-23 17:50 toto2.tes > 3 -rw-r--r-- 1 root root 14 2007-04-23 17:50 toto2.tes > 2 -rw-r--r-- 1 root root 12 2007-04-23 17:51 toto.tes > 2 -rw-r--r-- 1 root root 12 2007-04-23 17:51 toto.tes > > funny is it not ? I you've followed from the beginning, the "server2"'s > toto.tes is a blank file, but cat toto.tes on client give us bliblibli. > I presume that the system use the most recent file. same explaination :) > And here I am, writing you all these adventures ! > > I think you would agree : this is not a really serious manner to make > tests (and report them). However, It makes me wondering : > > - if there is any way to correctly shutdown a brick. > - if there is any way to "supervise" the replication. > - what happens if 2 clients with 2 different config (let's say, one > "unify", one "afr" - replicate) access to the same shared system. > > I saw that fault-tolerance is your 1.4.0 priority. For now on, I will > continue my silly little tests. This morning, I've discovered that > killing the "1st" brick to troubleshoot it makes the whole thing > inaccessible to the client (until the 1st brick comes up again). > However, someone on the IRC chan told me you were aware of this SPOF. yes the fix is ready and is about to be committed. > I am going to test the other single or multiple node failures scenarios > right now, and see how the client reacts. > > Your glusterfs is great, and we WILL use it in our out-to-come cluster > architecture. ask us for any support :) avati > Cheers, and see you on IRC > > Seb / Enkahel > > Sebastien LELIEVRE > slelievre@xxxxxxxxxxxxxxxx Services to ISP > TBS-internet http://www.TBS-internet.com/ > > Cet email est sécurisé par un certificat. En savoir plus: > http://www.tbs-certificats.com/email-securise.html > > > > ----- End forwarded message ----- > > -- > ultimate_answer_t > deep_thought (void) > { > sleep (years2secs (7500000)); > return 42; > } > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxx > http://lists.nongnu.org/mailman/listinfo/gluster-devel > -- ultimate_answer_t deep_thought (void) { sleep (years2secs (7500000)); return 42; }