Gluster 2.0.2 locking up issues

jvanwanrooy at chatventure.nl (Jasper van Wanrooy - Chatventure) · Thu, 18 Jun 2009 14:32:05 +0200

Hi Daniel,

I see you are using the brick volume from the server side. Did you try  
splitting it up so the client and server are in different processes?  
That could possibly cause a problem.

Thanks, Jasper

On 18 jun 2009, at 14:18, Daniel Jordan Bambach wrote:

> Well one of the servers just locked up again (completely).
>
> All accesses were occurring on the other machine at the time, We had  
> a moment when a directory on the still running server went to  
> 'Device or Resource Busy', I restartedt Gluster on that machine to  
> clear the issue, then noticed the second had died (not sure if it  
> happened at the same time or not)
>
> I'm trying to update the dump_caches value to 3, but it isn't  
> letting me for some reason (permission denied as root ?)
>
> Will adding DEBUG to the glusterfs commandline give me more  
> information across the whole process rather than the trace (below)  
> which isnt giving anything away?
>
>
> [2009-06-18 12:42:08] N [trace.c:1245:trace_lookup] trace: 162: (loc  
> {path=/www/site/rebuild2008/faber, ino=0})
> [2009-06-18 12:42:08] N [trace.c:513:trace_lookup_cbk] trace: 162:  
> (op_ret=0, ino=0, *buf {st_dev=64768, st_ino=37883182,  
> st_mode=40775, st_nlink=24, st_uid=504, st_gid=501, st_rdev=0,  
> st_size=4096, st_blksize=4096, st_blocks=16})
> [2009-06-18 12:42:08] N [trace.c:1245:trace_lookup] trace: 163: (loc  
> {path=/www/site/rebuild2008/faber/site-media, ino=0})
> [2009-06-18 12:42:08] N [trace.c:513:trace_lookup_cbk] trace: 163:  
> (op_ret=0, ino=0, *buf {st_dev=64768, st_ino=19238048,  
> st_mode=40777, st_nlink=21, st_uid=504, st_gid=501, st_rdev=0,  
> st_size=4096, st_blksize=4096, st_blocks=16})
> [2009-06-18 12:42:08] N [trace.c:1245:trace_lookup] trace: 164: (loc  
> {path=/www/site/rebuild2008/faber/site-media/onix-images, ino=0})
> [2009-06-18 12:42:08] N [trace.c:513:trace_lookup_cbk] trace: 164:  
> (op_ret=0, ino=0, *buf {st_dev=64768, st_ino=37884374,  
> st_mode=40777, st_nlink=4, st_uid=504, st_gid=501, st_rdev=0,  
> st_size=114688, st_blksize=4096, st_blocks=240})
> [2009-06-18 12:42:08] N [trace.c:1245:trace_lookup] trace: 165: (loc  
> {path=/www/site/rebuild2008/faber/site-media/onix-images/thumbs,  
> ino=0})
> [2009-06-18 12:42:08] N [trace.c:513:trace_lookup_cbk] trace: 165:  
> (op_ret=0, ino=0, *buf {st_dev=64768, st_ino=19238105,  
> st_mode=40777, st_nlink=3, st_uid=504, st_gid=501, st_rdev=0,  
> st_size=479232, st_blksize=4096, st_blocks=952})
> [2009-06-18 12:42:08] N [trace.c:1245:trace_lookup] trace: 166: (loc  
> {path=/www/site/rebuild2008/faber/site-media/onix-images/thumbs/ 
> 185_jpg_130x400_q85.jpg, ino=0})
> [2009-06-18 12:42:08] N [trace.c:513:trace_lookup_cbk] trace: 166:  
> (op_ret=0, ino=0, *buf {st_dev=64768, st_ino=7089866,  
> st_mode=100644, st_nlink=1, st_uid=504, st_gid=501, st_rdev=0,  
> st_size=10919, st_blksize=4096, st_blocks=32})
> ---ends--
>
>
> On 18 Jun 2009, at 11:53, Daniel Jordan Bambach wrote:
>
>> Willdo, though I recently added in those lines to help be explicit  
>> about behaviour (I had no options set before at all, leaving it to  
>> the default of 16 threads). I will remove and specify the default  
>> of 16 to see if that helps.
>>
>> Im adding:
>>
>> volume trace
>> type debug/trace
>> subvolumes cache
>> end-volume
>>
>> to both sides now as well, so next time (if any) it locks up  
>> perhaps there will be some more info.
>>
>> Thanks Shehjar
>>
>>
>> On 18 Jun 2009, at 11:26, Shehjar Tikoo wrote:
>>
>>> Daniel Jordan Bambach wrote:
>>>> I'm experiencing various locking up issues ranging from Gluster  
>>>> locking up ( 'ls'ing the mount hangs ), to the whole machine  
>>>> locking up under load.
>>>> My current config is below (two servers, afring)
>>>> I would love to be able to get to the bottom of this, because it  
>>>> seems very strange that we should see erratic behaviour on such a  
>>>> simple setup.
>>>> There is approx 12Gb of files, and to stress test (and heal) i  
>>>> run ls -alR on the mount. This will run for a while and  
>>>> eventually lock up Gluster, and occasionally the machine. I have  
>>>> found that in some cases killing Gluster and re-mounting does not  
>>>> solve the problem (in that perhaps both servers have entered a  
>>>> locked state in some way).
>>>> Im finding it very hard to collect and debug information of any  
>>>> use, as there is no crashlog, no errors in the volume log.
>>>> Can anyone suggest what I migth be able to do to extract more  
>>>> information as to what is occuring at lock-up time?
>>>> volume posix
>>>> type storage/posix
>>>> option directory /home/export
>>>> end-volume
>>>> volume locks
>>>> type features/locks
>>>> subvolumes posix
>>>> end-volume
>>>> volume brick
>>>> type performance/io-threads
>>>> subvolumes locks
>>>> option autoscaling on
>>>> option min-threads 8
>>>> option max-threads 32
>>>> end-volume
>>> I see that the max-threads will never exceed 32 which is
>>> a reasonable valueand should work fine in most cases but considering
>>> some of the other reports we've been getting, could you please try  
>>> again
>>> but without the autoscaling turned on?
>>>
>>> It is off by default, so you can simply set the number of threads
>>> you need by:
>>>
>>> option thread-count <COUNT>
>>>
>>> ...instead of the three "option" lines above.
>>>
>>> Thanks
>>> Shehjar
>>>
>>>
>>>> volume server
>>>> type protocol/server
>>>> option transport-type tcp
>>>> option auth.addr.brick.allow *
>>>> subvolumes brick
>>>> end-volume
>>>> volume latsrv2
>>>> type protocol/client
>>>> option transport-type tcp
>>>> option remote-host latsrv2
>>>> option remote-subvolume brick
>>>> end-volume
>>>> volume afr
>>>> type cluster/replicate
>>>> subvolumes brick latsrv2
>>>> option read-subvolume brick
>>>> end-volume
>>>> volume writebehind
>>>> type performance/write-behind
>>>> option cache-size 2MB
>>>> subvolumes afr
>>>> end-volume
>>>> volume cache
>>>> type performance/io-cache
>>>> option cache-size 32MB
>>>> option priority *.pyc:4,*.html:3,*.php:2,*:1
>>>> option cache-timeout 5
>>>> subvolumes writebehind
>>>> end-volume
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>>>
>>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users