gluster local vs local = gluster x4 slower

jenos at ncsa.uiuc.edu (Jeremy Enos) · Mon, 29 Mar 2010 13:21:41 -0500

Got a chance to run your suggested test:

##############GLUSTER SINGLE DISK##############

[root at ac33 gjenos]# dd bs=4096 count=32768 if=/dev/zero of=./filename.test
32768+0 records in
32768+0 records out
134217728 bytes (134 MB) copied, 8.60486 s, 15.6 MB/s
[root at ac33 gjenos]#
[root at ac33 gjenos]# cd /export/jenos/

##############DIRECT SINGLE DISK##############

[root at ac33 jenos]# dd bs=4096 count=32768 if=/dev/zero of=./filename.test
32768+0 records in
32768+0 records out
134217728 bytes (134 MB) copied, 0.21915 s, 612 MB/s
[root at ac33 jenos]#

If doing anything that can see a cache benefit, the performance of 
Gluster can't compare.  Is it even using cache?

This is the client vol file I used for that test:

[root at ac33 jenos]# cat /etc/glusterfs/ghome.vol
#-----------IB remotes------------------
volume ghome
   type protocol/client
   option transport-type tcp/client
   option remote-host ac33
   option remote-subvolume ibstripe
end-volume

#------------Performance Options-------------------

volume readahead
   type performance/read-ahead
   option page-count 4           # 2 is default option
   option force-atime-update off # default is off
   subvolumes ghome
end-volume

volume writebehind
   type performance/write-behind
   option cache-size 1MB
   subvolumes readahead
end-volume

volume cache
   type performance/io-cache
   option cache-size 2GB
   subvolumes writebehind
end-volume

Any suggestions appreciated.  thx-

     Jeremy

On 3/26/2010 6:09 PM, Bryan Whitehead wrote:
> One more thought, looks like (from your emails) you are always running
> the gluster test first. Maybe the tar file is being read from disk
> when you do the gluster test, then being read from cache when you run
> for the disk.
>
> What if you just pull a chunk of 0's off /dev/zero?
>
> dd bs=4096 count=32768 if=/dev/zero of=./filename.test
>
> or stick the tar in a ramdisk?
>
> (or run the benchmark 10 times for each, drop the best and the worse,
> and average the remaining 8)
>
> Would also be curious if you add another node if the time would be
> halved, then add another 2... then it would be halved again? I guess
> that depends on if striping or just replicating is being used.
> (unfortunately I don't have access to more than 1 test box right now).
>
> On Wed, Mar 24, 2010 at 11:06 PM, Jeremy Enos<jenos at ncsa.uiuc.edu>  wrote:
>    
>> For completeness:
>>
>> ##############GLUSTER SINGLE DISK NO PERFORMANCE OPTIONS##############
>> [root at ac33 gjenos]# time (tar xzf
>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz&&  sync )
>>
>> real    0m41.052s
>> user    0m7.705s
>> sys     0m3.122s
>> ##############DIRECT SINGLE DISK##############
>> [root at ac33 gjenos]# cd /export/jenos
>> [root at ac33 jenos]# time (tar xzf
>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz&&  sync )
>>
>> real    0m22.093s
>> user    0m6.932s
>> sys     0m2.459s
>> [root at ac33 jenos]#
>>
>> The performance options don't appear to be the problem.  So the question
>> stands- how do I get the disk cache advantage through the Gluster mounted
>> filesystem?  It seems to be key in the large performance difference.
>>
>>     Jeremy
>>
>> On 3/24/2010 4:47 PM, Jeremy Enos wrote:
>>      
>>> Good suggestion- I hadn't tried that yet.  It brings them much closer.
>>>
>>> ##############GLUSTER SINGLE DISK##############
>>> [root at ac33 gjenos]# time (tar xzf
>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz&&  sync )
>>>
>>> real    0m32.089s
>>> user    0m6.516s
>>> sys     0m3.177s
>>> ##############DIRECT SINGLE DISK##############
>>> [root at ac33 gjenos]# cd /export/jenos/
>>> [root at ac33 jenos]# time (tar xzf
>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz&&  sync )
>>>
>>> real    0m25.089s
>>> user    0m6.850s
>>> sys     0m2.058s
>>> ##############DIRECT SINGLE DISK CACHED##############
>>> [root at ac33 jenos]# time (tar xzf
>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz )
>>>
>>> real    0m8.955s
>>> user    0m6.785s
>>> sys     0m1.848s
>>>
>>>
>>> Oddly, I'm seeing better performance on the gluster system than previous
>>> tests too (used to be ~39 s).  The direct disk time is obviously benefiting
>>> from cache.  There is still a difference, but it appears most of the
>>> difference disappears w/ the cache advantage removed.  That said- the
>>> relative performance issue then still exists with Gluster.  What can be done
>>> to make it benefit from cache the same way direct disk does?
>>> thx-
>>>
>>>     Jeremy
>>>
>>> P.S.
>>> I'll be posting results w/ performance options completely removed from
>>> gluster as soon as I get a chance.
>>>
>>>     Jeremy
>>>
>>> On 3/24/2010 4:23 PM, Bryan Whitehead wrote:
>>>        
>>>> I'd like to see results with this:
>>>>
>>>> time ( tar xzf /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz&&
>>>>   sync )
>>>>
>>>> I've found local filesystems seem to use cache very heavily. The
>>>> untarred file could mostly be sitting in ram with local fs vs going
>>>> though fuse (which might do many more sync'ed flushes to disk?).
>>>>
>>>> On Wed, Mar 24, 2010 at 2:25 AM, Jeremy Enos<jenos at ncsa.uiuc.edu>    wrote:
>>>>          
>>>>> I also neglected to mention that the underlying filesystem is ext3.
>>>>>
>>>>> On 3/24/2010 3:44 AM, Jeremy Enos wrote:
>>>>>            
>>>>>> I haven't tried all performance options disabled yet- I can try that
>>>>>> tomorrow when the resource frees up.  I was actually asking first
>>>>>> before
>>>>>> blindly trying different configuration matrices in case there's a clear
>>>>>> direction I should take with it.  I'll let you know.
>>>>>>
>>>>>>     Jeremy
>>>>>>
>>>>>> On 3/24/2010 2:54 AM, Stephan von Krawczynski wrote:
>>>>>>              
>>>>>>> Hi Jeremy,
>>>>>>>
>>>>>>> have you tried to reproduce with all performance options disabled?
>>>>>>> They
>>>>>>> are
>>>>>>> possibly no good idea on a local system.
>>>>>>> What local fs do you use?
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Regards,
>>>>>>> Stephan
>>>>>>>
>>>>>>>
>>>>>>> On Tue, 23 Mar 2010 19:11:28 -0500
>>>>>>> Jeremy Enos<jenos at ncsa.uiuc.edu>      wrote:
>>>>>>>
>>>>>>>                
>>>>>>>> Stephan is correct- I primarily did this test to show a demonstrable
>>>>>>>> overhead example that I'm trying to eliminate.  It's pronounced
>>>>>>>> enough
>>>>>>>> that it can be seen on a single disk / single node configuration,
>>>>>>>> which
>>>>>>>> is good in a way (so anyone can easily repro).
>>>>>>>>
>>>>>>>> My distributed/clustered solution would be ideal if it were fast
>>>>>>>> enough
>>>>>>>> for small block i/o as well as large block- I was hoping that single
>>>>>>>> node systems would achieve that, hence the single node test.  Because
>>>>>>>> the single node test performed poorly, I eventually reduced down to
>>>>>>>> single disk to see if it could still be seen, and it clearly can be.
>>>>>>>> Perhaps it's something in my configuration?  I've pasted my config
>>>>>>>> files
>>>>>>>> below.
>>>>>>>> thx-
>>>>>>>>
>>>>>>>>       Jeremy
>>>>>>>>
>>>>>>>> ######################glusterfsd.vol######################
>>>>>>>> volume posix
>>>>>>>>     type storage/posix
>>>>>>>>     option directory /export
>>>>>>>> end-volume
>>>>>>>>
>>>>>>>> volume locks
>>>>>>>>     type features/locks
>>>>>>>>     subvolumes posix
>>>>>>>> end-volume
>>>>>>>>
>>>>>>>> volume disk
>>>>>>>>     type performance/io-threads
>>>>>>>>     option thread-count 4
>>>>>>>>     subvolumes locks
>>>>>>>> end-volume
>>>>>>>>
>>>>>>>> volume server-ib
>>>>>>>>     type protocol/server
>>>>>>>>     option transport-type ib-verbs/server
>>>>>>>>     option auth.addr.disk.allow *
>>>>>>>>     subvolumes disk
>>>>>>>> end-volume
>>>>>>>>
>>>>>>>> volume server-tcp
>>>>>>>>     type protocol/server
>>>>>>>>     option transport-type tcp/server
>>>>>>>>     option auth.addr.disk.allow *
>>>>>>>>     subvolumes disk
>>>>>>>> end-volume
>>>>>>>>
>>>>>>>> ######################ghome.vol######################
>>>>>>>>
>>>>>>>> #-----------IB remotes------------------
>>>>>>>> volume ghome
>>>>>>>>     type protocol/client
>>>>>>>>     option transport-type ib-verbs/client
>>>>>>>> #  option transport-type tcp/client
>>>>>>>>     option remote-host acfs
>>>>>>>>     option remote-subvolume raid
>>>>>>>> end-volume
>>>>>>>>
>>>>>>>> #------------Performance Options-------------------
>>>>>>>>
>>>>>>>> volume readahead
>>>>>>>>     type performance/read-ahead
>>>>>>>>     option page-count 4           # 2 is default option
>>>>>>>>     option force-atime-update off # default is off
>>>>>>>>     subvolumes ghome
>>>>>>>> end-volume
>>>>>>>>
>>>>>>>> volume writebehind
>>>>>>>>     type performance/write-behind
>>>>>>>>     option cache-size 1MB
>>>>>>>>     subvolumes readahead
>>>>>>>> end-volume
>>>>>>>>
>>>>>>>> volume cache
>>>>>>>>     type performance/io-cache
>>>>>>>>     option cache-size 1GB
>>>>>>>>     subvolumes writebehind
>>>>>>>> end-volume
>>>>>>>>
>>>>>>>> ######################END######################
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 3/23/2010 6:02 AM, Stephan von Krawczynski wrote:
>>>>>>>>                  
>>>>>>>>> On Tue, 23 Mar 2010 02:59:35 -0600 (CST)
>>>>>>>>> "Tejas N. Bhise"<tejas at gluster.com>       wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                    
>>>>>>>>>> Out of curiosity, if you want to do stuff only on one machine,
>>>>>>>>>> why do you want to use a distributed, multi node, clustered,
>>>>>>>>>> file system ?
>>>>>>>>>>
>>>>>>>>>>                      
>>>>>>>>> Because what he does is a very good way to show the overhead
>>>>>>>>> produced
>>>>>>>>> only by
>>>>>>>>> glusterfs and nothing else (i.e. no network involved).
>>>>>>>>> A pretty relevant test scenario I would say.
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Regards,
>>>>>>>>> Stephan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                    
>>>>>>>>>> Am I missing something here ?
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Tejas.
>>>>>>>>>>
>>>>>>>>>> ----- Original Message -----
>>>>>>>>>> From: "Jeremy Enos"<jenos at ncsa.uiuc.edu>
>>>>>>>>>> To: gluster-users at gluster.org
>>>>>>>>>> Sent: Tuesday, March 23, 2010 2:07:06 PM GMT +05:30 Chennai,
>>>>>>>>>> Kolkata,
>>>>>>>>>> Mumbai, New Delhi
>>>>>>>>>> Subject: gluster local vs local = gluster x4 slower
>>>>>>>>>>
>>>>>>>>>> This test is pretty easy to replicate anywhere- only takes 1 disk,
>>>>>>>>>> one
>>>>>>>>>> machine, one tarball.  Untarring to local disk directly vs thru
>>>>>>>>>> gluster
>>>>>>>>>> is about 4.5x faster.  At first I thought this may be due to a slow
>>>>>>>>>> host
>>>>>>>>>> (Opteron 2.4ghz).  But it's not- same configuration, on a much
>>>>>>>>>> faster
>>>>>>>>>> machine (dual 3.33ghz Xeon) yields the performance below.
>>>>>>>>>>
>>>>>>>>>> ####THIS TEST WAS TO A LOCAL DISK THRU GLUSTER####
>>>>>>>>>> [root at ac33 jenos]# time tar xzf
>>>>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz
>>>>>>>>>>
>>>>>>>>>> real    0m41.290s
>>>>>>>>>> user    0m14.246s
>>>>>>>>>> sys     0m2.957s
>>>>>>>>>>
>>>>>>>>>> ####THIS TEST WAS TO A LOCAL DISK (BYPASS GLUSTER)####
>>>>>>>>>> [root at ac33 jenos]# cd /export/jenos/
>>>>>>>>>> [root at ac33 jenos]# time tar xzf
>>>>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz
>>>>>>>>>>
>>>>>>>>>> real    0m8.983s
>>>>>>>>>> user    0m6.857s
>>>>>>>>>> sys     0m1.844s
>>>>>>>>>>
>>>>>>>>>> ####THESE ARE TEST FILE DETAILS####
>>>>>>>>>> [root at ac33 jenos]# tar tzvf
>>>>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz  |wc -l
>>>>>>>>>> 109
>>>>>>>>>> [root at ac33 jenos]# ls -l
>>>>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz
>>>>>>>>>> -rw-r--r-- 1 jenos ac 804385203 2010-02-07 06:32
>>>>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz
>>>>>>>>>> [root at ac33 jenos]#
>>>>>>>>>>
>>>>>>>>>> These are the relevant performance options I'm using in my .vol
>>>>>>>>>> file:
>>>>>>>>>>
>>>>>>>>>> #------------Performance Options-------------------
>>>>>>>>>>
>>>>>>>>>> volume readahead
>>>>>>>>>>      type performance/read-ahead
>>>>>>>>>>      option page-count 4           # 2 is default option
>>>>>>>>>>      option force-atime-update off # default is off
>>>>>>>>>>      subvolumes ghome
>>>>>>>>>> end-volume
>>>>>>>>>>
>>>>>>>>>> volume writebehind
>>>>>>>>>>      type performance/write-behind
>>>>>>>>>>      option cache-size 1MB
>>>>>>>>>>      subvolumes readahead
>>>>>>>>>> end-volume
>>>>>>>>>>
>>>>>>>>>> volume cache
>>>>>>>>>>      type performance/io-cache
>>>>>>>>>>      option cache-size 1GB
>>>>>>>>>>      subvolumes writebehind
>>>>>>>>>> end-volume
>>>>>>>>>>
>>>>>>>>>> What can I do to improve gluster's performance?
>>>>>>>>>>
>>>>>>>>>>        Jeremy
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                      
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>>>
>>>>>            
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>
>>>        
>>      
>