Hi Daniel, I see you are using the brick volume from the server side. Did you try splitting it up so the client and server are in different processes? That could possibly cause a problem. Thanks, Jasper On 18 jun 2009, at 14:18, Daniel Jordan Bambach wrote: > Well one of the servers just locked up again (completely). > > All accesses were occurring on the other machine at the time, We had > a moment when a directory on the still running server went to > 'Device or Resource Busy', I restartedt Gluster on that machine to > clear the issue, then noticed the second had died (not sure if it > happened at the same time or not) > > I'm trying to update the dump_caches value to 3, but it isn't > letting me for some reason (permission denied as root ?) > > Will adding DEBUG to the glusterfs commandline give me more > information across the whole process rather than the trace (below) > which isnt giving anything away? > > > [2009-06-18 12:42:08] N [trace.c:1245:trace_lookup] trace: 162: (loc > {path=/www/site/rebuild2008/faber, ino=0}) > [2009-06-18 12:42:08] N [trace.c:513:trace_lookup_cbk] trace: 162: > (op_ret=0, ino=0, *buf {st_dev=64768, st_ino=37883182, > st_mode=40775, st_nlink=24, st_uid=504, st_gid=501, st_rdev=0, > st_size=4096, st_blksize=4096, st_blocks=16}) > [2009-06-18 12:42:08] N [trace.c:1245:trace_lookup] trace: 163: (loc > {path=/www/site/rebuild2008/faber/site-media, ino=0}) > [2009-06-18 12:42:08] N [trace.c:513:trace_lookup_cbk] trace: 163: > (op_ret=0, ino=0, *buf {st_dev=64768, st_ino=19238048, > st_mode=40777, st_nlink=21, st_uid=504, st_gid=501, st_rdev=0, > st_size=4096, st_blksize=4096, st_blocks=16}) > [2009-06-18 12:42:08] N [trace.c:1245:trace_lookup] trace: 164: (loc > {path=/www/site/rebuild2008/faber/site-media/onix-images, ino=0}) > [2009-06-18 12:42:08] N [trace.c:513:trace_lookup_cbk] trace: 164: > (op_ret=0, ino=0, *buf {st_dev=64768, st_ino=37884374, > st_mode=40777, st_nlink=4, st_uid=504, st_gid=501, st_rdev=0, > st_size=114688, st_blksize=4096, st_blocks=240}) > [2009-06-18 12:42:08] N [trace.c:1245:trace_lookup] trace: 165: (loc > {path=/www/site/rebuild2008/faber/site-media/onix-images/thumbs, > ino=0}) > [2009-06-18 12:42:08] N [trace.c:513:trace_lookup_cbk] trace: 165: > (op_ret=0, ino=0, *buf {st_dev=64768, st_ino=19238105, > st_mode=40777, st_nlink=3, st_uid=504, st_gid=501, st_rdev=0, > st_size=479232, st_blksize=4096, st_blocks=952}) > [2009-06-18 12:42:08] N [trace.c:1245:trace_lookup] trace: 166: (loc > {path=/www/site/rebuild2008/faber/site-media/onix-images/thumbs/ > 185_jpg_130x400_q85.jpg, ino=0}) > [2009-06-18 12:42:08] N [trace.c:513:trace_lookup_cbk] trace: 166: > (op_ret=0, ino=0, *buf {st_dev=64768, st_ino=7089866, > st_mode=100644, st_nlink=1, st_uid=504, st_gid=501, st_rdev=0, > st_size=10919, st_blksize=4096, st_blocks=32}) > ---ends-- > > > On 18 Jun 2009, at 11:53, Daniel Jordan Bambach wrote: > >> Willdo, though I recently added in those lines to help be explicit >> about behaviour (I had no options set before at all, leaving it to >> the default of 16 threads). I will remove and specify the default >> of 16 to see if that helps. >> >> Im adding: >> >> volume trace >> type debug/trace >> subvolumes cache >> end-volume >> >> to both sides now as well, so next time (if any) it locks up >> perhaps there will be some more info. >> >> Thanks Shehjar >> >> >> On 18 Jun 2009, at 11:26, Shehjar Tikoo wrote: >> >>> Daniel Jordan Bambach wrote: >>>> I'm experiencing various locking up issues ranging from Gluster >>>> locking up ( 'ls'ing the mount hangs ), to the whole machine >>>> locking up under load. >>>> My current config is below (two servers, afring) >>>> I would love to be able to get to the bottom of this, because it >>>> seems very strange that we should see erratic behaviour on such a >>>> simple setup. >>>> There is approx 12Gb of files, and to stress test (and heal) i >>>> run ls -alR on the mount. This will run for a while and >>>> eventually lock up Gluster, and occasionally the machine. I have >>>> found that in some cases killing Gluster and re-mounting does not >>>> solve the problem (in that perhaps both servers have entered a >>>> locked state in some way). >>>> Im finding it very hard to collect and debug information of any >>>> use, as there is no crashlog, no errors in the volume log. >>>> Can anyone suggest what I migth be able to do to extract more >>>> information as to what is occuring at lock-up time? >>>> volume posix >>>> type storage/posix >>>> option directory /home/export >>>> end-volume >>>> volume locks >>>> type features/locks >>>> subvolumes posix >>>> end-volume >>>> volume brick >>>> type performance/io-threads >>>> subvolumes locks >>>> option autoscaling on >>>> option min-threads 8 >>>> option max-threads 32 >>>> end-volume >>> I see that the max-threads will never exceed 32 which is >>> a reasonable valueand should work fine in most cases but considering >>> some of the other reports we've been getting, could you please try >>> again >>> but without the autoscaling turned on? >>> >>> It is off by default, so you can simply set the number of threads >>> you need by: >>> >>> option thread-count <COUNT> >>> >>> ...instead of the three "option" lines above. >>> >>> Thanks >>> Shehjar >>> >>> >>>> volume server >>>> type protocol/server >>>> option transport-type tcp >>>> option auth.addr.brick.allow * >>>> subvolumes brick >>>> end-volume >>>> volume latsrv2 >>>> type protocol/client >>>> option transport-type tcp >>>> option remote-host latsrv2 >>>> option remote-subvolume brick >>>> end-volume >>>> volume afr >>>> type cluster/replicate >>>> subvolumes brick latsrv2 >>>> option read-subvolume brick >>>> end-volume >>>> volume writebehind >>>> type performance/write-behind >>>> option cache-size 2MB >>>> subvolumes afr >>>> end-volume >>>> volume cache >>>> type performance/io-cache >>>> option cache-size 32MB >>>> option priority *.pyc:4,*.html:3,*.php:2,*:1 >>>> option cache-timeout 5 >>>> subvolumes writebehind >>>> end-volume >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users >>> >>> >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users >> > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users