Re: Re: Feedback - Problem with the locks feature

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I add the option trace on directive in the locks translator. Here the
results in attached files.

There are two logs files (the first one contains the first lines when
the sessions is created and readed, and the second one contains the next
accesses when I reload the page (3 times seperated by an empty line, the
same thing, PHP freezes on the posix BLOCKED everytime).

Hope it would help you.

I will apply the patch now.

Regards.

Le vendredi 05 février 2010 à 06:21 -0600, Tejas N. Bhise a écrit :
> Thanks, Samuel.
> 
> Also, as mentioned earlier please provide us details of the linux kernel 
> version / Fuse Kernel Module versions on both the servers and the clients used
> apart from the output of the 'option trace on' in the locks translator.
> 
> Regards,
> Tejas.
> 
> ----- Original Message -----
> From: "Samuel Hassine" <samuel.hassine@xxxxxxxxx>
> To: "Pavan Vilas Sondur" <pavan@xxxxxxxxxxx>
> Cc: avati@xxxxxxxxxxx, "Yann Autissier" <yann.autissier@xxxxxxxxxxxxxxxxxx>, "Gluster List" <gluster-devel@xxxxxxxxxx>
> Sent: Friday, February 5, 2010 5:43:46 PM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi
> Subject: Re: Feedback - Problem with the locks feature
> 
> Hi all,
> 
> Juste before I test this patch, I have an other bug to report
> with/without the locks translator. As I said in my first email, I just
> change from NFS to GlusterFS for my websites storage partition (about 15
> 000 websites).
> 
> I thought that only PHP sessions didnt "like" the posix locks but its
> not. The other simple distributed partition for website files is
> impacted :
> 
> With the posix locks, I have 30% of web server internal errors 500
> (premature end of scripts headers), but without locks (I just change the
> configuration), no 500 et no end of scripts headers. So I think there is
> a link. (We have a huge trafic, maybe it could be another reason).
> 
> I'm applying the patch right know and will give you a feedback as soon
> as possible.
> 
> Regards.
> 
> Le vendredi 05 février 2010 à 15:13 +0530, Pavan Vilas Sondur a écrit :
> > Hi Samuel,
> > Looking at log messages such as these:
> > > > [2010-02-04 21:11:22] W [posix.c:246:posix_lstat_with_gen] posix1:
> > > > Access to /data//.. (on dev 2049) is crossing device (64768)
> > > > [2010-02-04 21:11:24] W [posix.c:246:posix_lstat_with_gen] posix1:
> > > > Access to /data//.. (on dev 2049) is crossing device (64768)
> > 
> > It seems you are also running into bug 571 (http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=576). Can
> > you apply this patch: http://patches.gluster.com/patch/2716 and let us know how it goes. Also, can you provide
> > us details of the linux kernel version / Fuse Kernel Module versions on both the servers and the clients used
> > apart from the output of the 'option trace on' in the locks translator.
> > 
> > Pavan
> > 
> > On 04/02/10 21:42 -0600, Anand Avati wrote:
> > > 
> > > ----- "Samuel Hassine" <samuel.hassine@xxxxxxxxx> wrote:
> > > 
> > > > Hi all,
> > > > 
> > > > For the PHP script with little write/read accesses I will try to find
> > > > it (I dont remember exactly the syntax), but for PHP Sessions, the bug
> > > > could be easily reproduced.
> > > > 
> > > > I just test it on a new very simple GlusterFS partition with no trafic
> > > > (juste me), and I reproduced it immediatly.
> > > > 
> > > > Explainations:
> > > > - 2 servers Debian Lenny stable
> > > > - GlusterFS 3.0.0 in distributed mode (one server and multiple
> > > > clients)
> > > > - Lighttpd / PHP5 Fast-CGI
> > > > 
> > > > I juste mount the GlusterFS partition on the /var/www directory.
> > > > 
> > > > First of all, the PHP script you can execute:
> > > > 
> > > > <?php
> > > > session_save_path('.');
> > > > //if you want to verify if it worked
> > > > //echo session_save_path();
> > > > session_start();
> > > > ?>
> > > > 
> > > > Secondly, there are 2 configurations if GlusterFS and, of course, one
> > > > works and one does not.
> > > > The client configuration is the same in the both cases:
> > > > 
> > > > glusterfs.vol
> > > > volume test-1
> > > > type protocol/client
> > > > option transport-type tcp
> > > > option remote-host test
> > > > option transport.socket.nodelay on
> > > > option transport.remote-port 6996
> > > > option remote-subvolume brick1
> > > > end-volume
> > > > 
> > > > volume writebehind
> > > > type performance/write-behind
> > > > option cache-size 4MB
> > > > subvolumes test-1
> > > > end-volume
> > > > 
> > > > volume readahead
> > > > type performance/read-ahead
> > > > option page-count 4
> > > > subvolumes writebehind
> > > > end-volume
> > > > 
> > > > volume iocache
> > > > type performance/io-cache
> > > > option cache-size 1GB
> > > > option cache-timeout 1
> > > > subvolumes readahead
> > > > end-volume
> > > > 
> > > > volume quickread
> > > > type performance/quick-read
> > > > option cache-timeout 1
> > > > option max-file-size 64kB
> > > > subvolumes iocache
> > > > end-volume
> > > > 
> > > > volume statprefetch
> > > > type performance/stat-prefetch
> > > > subvolumes quickread
> > > > end-volume
> > > > 
> > > > Now the server configuration:
> > > > 
> > > > glusterfsd.vol (this doesnt work)
> > > > volume posix1
> > > > type storage/posix
> > > > option directory /data
> > > > end-volume
> > > > 
> > > > volume locks1
> > > > type features/locks
> > > > subvolumes posix1
> > > > end-volume
> > > > 
> > > > volume brick1
> > > > type performance/io-threads
> > > > option thread-count 8
> > > > subvolumes locks1
> > > > end-volume
> > > > 
> > > > volume server-tcp
> > > > type protocol/server
> > > > option transport-type tcp
> > > > option auth.addr.brick1.allow *
> > > > option transport.socket.listen-port 6996
> > > > option transport.socket.nodelay on
> > > > subvolumes brick1
> > > > end-volume
> > > > 
> > > > glusterfsd.vol (this works)
> > > > volume posix1
> > > > type storage/posix
> > > > option directory /data
> > > > end-volume
> > > > 
> > > > #volume locks1
> > > > # type features/locks
> > > > # subvolumes posix1
> > > > #end-volume
> > > > 
> > > > volume brick1
> > > > type performance/io-threads
> > > > option thread-count 8
> > > > subvolumes posix1
> > > > end-volume
> > > > 
> > > > volume server-tcp
> > > > type protocol/server
> > > > option transport-type tcp
> > > > option auth.addr.brick1.allow *
> > > > option transport.socket.listen-port 6996
> > > > option transport.socket.nodelay on
> > > > subvolumes brick1
> > > > end-volume
> > > > 
> > > > So, with the locks translator, you can execute the script one time (it
> > > > will be ok) but the second time the session file is on the file system
> > > > but locked and nobody can access to it. PHP freezes and processes
> > > > coult not be killed.
> > > > 
> > > > When it's happened, I have nothing in client-side logs but I have 2
> > > > kinds of message in the server-side logs:
> > > > When I execute the script:
> > > > [2010-02-04 21:11:22] W [posix.c:246:posix_lstat_with_gen] posix1:
> > > > Access to /data//.. (on dev 2049) is crossing device (64768)
> > > > [2010-02-04 21:11:24] W [posix.c:246:posix_lstat_with_gen] posix1:
> > > > Access to /data//.. (on dev 2049) is crossing device (64768)
> > > > 
> > > > When I try to umount -f (disconnect the gluster):
> > > > [2010-02-04 21:13:45] E [server-protocol.c:339:protocol_server_reply]
> > > > protocol/server: frame 20: failed to submit. op= 26, type= 4
> > > > 
> > > > As I said I will try to find the other PHP script.
> > > > 
> > > > I hope this will help you.
> > > 
> > > I tried to reproduce the problem with your exact configuration (only changing 'option remote-host') from 1 server and 2 clients. I was not able to hit the problem with the configuration which is breaking for you. I used v3.0.0 as well.
> > > 
> > > Can you please turn 'option trace on' in the locks translator and give us the server log when the php session hangs?
> > > 
> > > Thanks,
> > > Avati
> > > 
> > > 
> > > _______________________________________________
> > > Gluster-devel mailing list
> > > Gluster-devel@xxxxxxxxxx
> > > http://lists.nongnu.org/mailman/listinfo/gluster-devel
> 
> 
> 
> 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxx
> http://lists.nongnu.org/mailman/listinfo/gluster-devel

[2010-02-05 13:49:44] N [common.c:251:pl_trace_in] locks1: [REQUEST] Locker = {Pid=4735, lk-owner=468347034653121380, Transport=0x625770, Frame=64} Lockee = {ino=135, fd=0x62bf40, path=/sessions/sess_799677b063c76102f6a339228e7735b2} Lock = {lock=FCNTL, cmd=SETLKW, type=WRITE, start=0, len=0, pid=4735, lk-owner=468347034653121380}
[2010-02-05 13:49:44] N [common.c:303:pl_trace_out] locks1: [GRANTED] Locker = {Pid=4735, lk-owner=468347034653121380, Transport=0x625770, Frame=64} Lockee = {ino=135, fd=0x62bf40, path=/sessions/sess_799677b063c76102f6a339228e7735b2} Lock = {lock=FCNTL, cmd=SETLKW, type=WRITE, start=0, len=0, pid=4735, lk-owner=468347034653121380}
[2010-02-05 13:49:44] N [common.c:358:pl_trace_flush] locks1: [FLUSH] Locker = {Pid=4735, lk-owner=4572061777502758492, Transport=0x625770, Frame=67} Lockee = {ino=135, fd=0x62bf40, path=/sessions/sess_799677b063c76102f6a339228e7735b2}
[2010-02-05 13:49:44] N [common.c:251:pl_trace_in] locks1: [REQUEST] Locker = {Pid=4736, lk-owner=12870044932969840022, Transport=0x625770, Frame=69} Lockee = {ino=135, fd=0x62bf40, path=/sessions/sess_799677b063c76102f6a339228e7735b2} Lock = {lock=FCNTL, cmd=SETLKW, type=WRITE, start=0, len=0, pid=4736, lk-owner=12870044932969840022}
[2010-02-05 13:49:44] N [common.c:331:pl_trace_block] locks1: [BLOCKED] Locker = {Pid=4736, lk-owner=12870044932969840022, Transport=0x625770, Frame=69} Lockee = {ino=135, fd=0x62bf40, path=/sessions/sess_799677b063c76102f6a339228e7735b2} Lock = {lock=FCNTL, cmd=SETLKW, type=WRITE, start=0, len=0, pid=4736, lk-owner=12870044932969840022}
[2010-02-05 13:49:45] N [common.c:251:pl_trace_in] locks1: [REQUEST] Locker = {Pid=4737, lk-owner=4842419406141294702, Transport=0x625770, Frame=70} Lockee = {ino=135, fd=0x62b760, path=/sessions/sess_799677b063c76102f6a339228e7735b2} Lock = {lock=FCNTL, cmd=SETLKW, type=WRITE, start=0, len=0, pid=4737, lk-owner=4842419406141294702}
[2010-02-05 13:49:45] N [common.c:331:pl_trace_block] locks1: [BLOCKED] Locker = {Pid=4737, lk-owner=4842419406141294702, Transport=0x625770, Frame=70} Lockee = {ino=135, fd=0x62b760, path=/sessions/sess_799677b063c76102f6a339228e7735b2} Lock = {lock=FCNTL, cmd=SETLKW, type=WRITE, start=0, len=0, pid=4737, lk-owner=4842419406141294702}
2010-02-05 13:50:06] N [common.c:251:pl_trace_in] locks1: [REQUEST] Locker = {Pid=4738, lk-owner=16020424428201149064, Transport=0x625770, Frame=71} Lockee = {ino=135, fd=0x62dcc0, path=/sessions/sess_799677b063c76102f6a339228e7735b2} Lock = {lock=FCNTL, cmd=SETLKW, type=WRITE, start=0, len=0, pid=4738, lk-owner=16020424428201149064}
[2010-02-05 13:50:06] N [common.c:331:pl_trace_block] locks1: [BLOCKED] Locker = {Pid=4738, lk-owner=16020424428201149064, Transport=0x625770, Frame=71} Lockee = {ino=135, fd=0x62dcc0, path=/sessions/sess_799677b063c76102f6a339228e7735b2} Lock = {lock=FCNTL, cmd=SETLKW, type=WRITE, start=0, len=0, pid=4738, lk-owner=16020424428201149064}

[2010-02-05 13:52:06] N [common.c:251:pl_trace_in] locks1: [REQUEST] Locker = {Pid=4740, lk-owner=3321301037397210633, Transport=0x625770, Frame=75} Lockee = {ino=135, fd=0x62da60, path=/sessions/sess_799677b063c76102f6a339228e7735b2} Lock = {lock=FCNTL, cmd=SETLKW, type=WRITE, start=0, len=0, pid=4740, lk-owner=3321301037397210633}
[2010-02-05 13:52:06] N [common.c:331:pl_trace_block] locks1: [BLOCKED] Locker = {Pid=4740, lk-owner=3321301037397210633, Transport=0x625770, Frame=75} Lockee = {ino=135, fd=0x62da60, path=/sessions/sess_799677b063c76102f6a339228e7735b2} Lock = {lock=FCNTL, cmd=SETLKW, type=WRITE, start=0, len=0, pid=4740, lk-owner=3321301037397210633}

[2010-02-05 14:01:42] N [common.c:251:pl_trace_in] locks1: [REQUEST] Locker = {Pid=4741, lk-owner=9808792510454481294, Transport=0x625770, Frame=89} Lockee = {ino=135, fd=0x62e090, path=/sessions/sess_799677b063c76102f6a339228e7735b2} Lock = {lock=FCNTL, cmd=SETLKW, type=WRITE, start=0, len=0, pid=4741, lk-owner=9808792510454481294}
[2010-02-05 14:01:42] N [common.c:331:pl_trace_block] locks1: [BLOCKED] Locker = {Pid=4741, lk-owner=9808792510454481294, Transport=0x625770, Frame=89} Lockee = {ino=135, fd=0x62e090, path=/sessions/sess_799677b063c76102f6a339228e7735b2} Lock = {lock=FCNTL, cmd=SETLKW, type=WRITE, start=0, len=0, pid=4741, lk-owner=9808792510454481294}


[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux