Thanks, Samuel. Also, as mentioned earlier please provide us details of the linux kernel version / Fuse Kernel Module versions on both the servers and the clients used apart from the output of the 'option trace on' in the locks translator. Regards, Tejas. ----- Original Message ----- From: "Samuel Hassine" <samuel.hassine@xxxxxxxxx> To: "Pavan Vilas Sondur" <pavan@xxxxxxxxxxx> Cc: avati@xxxxxxxxxxx, "Yann Autissier" <yann.autissier@xxxxxxxxxxxxxxxxxx>, "Gluster List" <gluster-devel@xxxxxxxxxx> Sent: Friday, February 5, 2010 5:43:46 PM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: Re: Feedback - Problem with the locks feature Hi all, Juste before I test this patch, I have an other bug to report with/without the locks translator. As I said in my first email, I just change from NFS to GlusterFS for my websites storage partition (about 15 000 websites). I thought that only PHP sessions didnt "like" the posix locks but its not. The other simple distributed partition for website files is impacted : With the posix locks, I have 30% of web server internal errors 500 (premature end of scripts headers), but without locks (I just change the configuration), no 500 et no end of scripts headers. So I think there is a link. (We have a huge trafic, maybe it could be another reason). I'm applying the patch right know and will give you a feedback as soon as possible. Regards. Le vendredi 05 février 2010 à 15:13 +0530, Pavan Vilas Sondur a écrit : > Hi Samuel, > Looking at log messages such as these: > > > [2010-02-04 21:11:22] W [posix.c:246:posix_lstat_with_gen] posix1: > > > Access to /data//.. (on dev 2049) is crossing device (64768) > > > [2010-02-04 21:11:24] W [posix.c:246:posix_lstat_with_gen] posix1: > > > Access to /data//.. (on dev 2049) is crossing device (64768) > > It seems you are also running into bug 571 (http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=576). Can > you apply this patch: http://patches.gluster.com/patch/2716 and let us know how it goes. Also, can you provide > us details of the linux kernel version / Fuse Kernel Module versions on both the servers and the clients used > apart from the output of the 'option trace on' in the locks translator. > > Pavan > > On 04/02/10 21:42 -0600, Anand Avati wrote: > > > > ----- "Samuel Hassine" <samuel.hassine@xxxxxxxxx> wrote: > > > > > Hi all, > > > > > > For the PHP script with little write/read accesses I will try to find > > > it (I dont remember exactly the syntax), but for PHP Sessions, the bug > > > could be easily reproduced. > > > > > > I just test it on a new very simple GlusterFS partition with no trafic > > > (juste me), and I reproduced it immediatly. > > > > > > Explainations: > > > - 2 servers Debian Lenny stable > > > - GlusterFS 3.0.0 in distributed mode (one server and multiple > > > clients) > > > - Lighttpd / PHP5 Fast-CGI > > > > > > I juste mount the GlusterFS partition on the /var/www directory. > > > > > > First of all, the PHP script you can execute: > > > > > > <?php > > > session_save_path('.'); > > > //if you want to verify if it worked > > > //echo session_save_path(); > > > session_start(); > > > ?> > > > > > > Secondly, there are 2 configurations if GlusterFS and, of course, one > > > works and one does not. > > > The client configuration is the same in the both cases: > > > > > > glusterfs.vol > > > volume test-1 > > > type protocol/client > > > option transport-type tcp > > > option remote-host test > > > option transport.socket.nodelay on > > > option transport.remote-port 6996 > > > option remote-subvolume brick1 > > > end-volume > > > > > > volume writebehind > > > type performance/write-behind > > > option cache-size 4MB > > > subvolumes test-1 > > > end-volume > > > > > > volume readahead > > > type performance/read-ahead > > > option page-count 4 > > > subvolumes writebehind > > > end-volume > > > > > > volume iocache > > > type performance/io-cache > > > option cache-size 1GB > > > option cache-timeout 1 > > > subvolumes readahead > > > end-volume > > > > > > volume quickread > > > type performance/quick-read > > > option cache-timeout 1 > > > option max-file-size 64kB > > > subvolumes iocache > > > end-volume > > > > > > volume statprefetch > > > type performance/stat-prefetch > > > subvolumes quickread > > > end-volume > > > > > > Now the server configuration: > > > > > > glusterfsd.vol (this doesnt work) > > > volume posix1 > > > type storage/posix > > > option directory /data > > > end-volume > > > > > > volume locks1 > > > type features/locks > > > subvolumes posix1 > > > end-volume > > > > > > volume brick1 > > > type performance/io-threads > > > option thread-count 8 > > > subvolumes locks1 > > > end-volume > > > > > > volume server-tcp > > > type protocol/server > > > option transport-type tcp > > > option auth.addr.brick1.allow * > > > option transport.socket.listen-port 6996 > > > option transport.socket.nodelay on > > > subvolumes brick1 > > > end-volume > > > > > > glusterfsd.vol (this works) > > > volume posix1 > > > type storage/posix > > > option directory /data > > > end-volume > > > > > > #volume locks1 > > > # type features/locks > > > # subvolumes posix1 > > > #end-volume > > > > > > volume brick1 > > > type performance/io-threads > > > option thread-count 8 > > > subvolumes posix1 > > > end-volume > > > > > > volume server-tcp > > > type protocol/server > > > option transport-type tcp > > > option auth.addr.brick1.allow * > > > option transport.socket.listen-port 6996 > > > option transport.socket.nodelay on > > > subvolumes brick1 > > > end-volume > > > > > > So, with the locks translator, you can execute the script one time (it > > > will be ok) but the second time the session file is on the file system > > > but locked and nobody can access to it. PHP freezes and processes > > > coult not be killed. > > > > > > When it's happened, I have nothing in client-side logs but I have 2 > > > kinds of message in the server-side logs: > > > When I execute the script: > > > [2010-02-04 21:11:22] W [posix.c:246:posix_lstat_with_gen] posix1: > > > Access to /data//.. (on dev 2049) is crossing device (64768) > > > [2010-02-04 21:11:24] W [posix.c:246:posix_lstat_with_gen] posix1: > > > Access to /data//.. (on dev 2049) is crossing device (64768) > > > > > > When I try to umount -f (disconnect the gluster): > > > [2010-02-04 21:13:45] E [server-protocol.c:339:protocol_server_reply] > > > protocol/server: frame 20: failed to submit. op= 26, type= 4 > > > > > > As I said I will try to find the other PHP script. > > > > > > I hope this will help you. > > > > I tried to reproduce the problem with your exact configuration (only changing 'option remote-host') from 1 server and 2 clients. I was not able to hit the problem with the configuration which is breaking for you. I used v3.0.0 as well. > > > > Can you please turn 'option trace on' in the locks translator and give us the server log when the php session hangs? > > > > Thanks, > > Avati > > > > > > _______________________________________________ > > Gluster-devel mailing list > > Gluster-devel@xxxxxxxxxx > > http://lists.nongnu.org/mailman/listinfo/gluster-devel _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxx http://lists.nongnu.org/mailman/listinfo/gluster-devel