gluster local vs local = gluster x4 slower

driver at megahappy.net (Bryan Whitehead) · Mon, 29 Mar 2010 12:00:24 -0700



heh, don't forget the && sync

:)

On Mon, Mar 29, 2010 at 11:21 AM, Jeremy Enos <jenos at ncsa.uiuc.edu> wrote:
> Got a chance to run your suggested test:
>
> ##############GLUSTER SINGLE DISK##############
>
> [root at ac33 gjenos]# dd bs=4096 count=32768 if=/dev/zero of=./filename.test
> 32768+0 records in
> 32768+0 records out
> 134217728 bytes (134 MB) copied, 8.60486 s, 15.6 MB/s
> [root at ac33 gjenos]#
> [root at ac33 gjenos]# cd /export/jenos/
>
> ##############DIRECT SINGLE DISK##############
>
> [root at ac33 jenos]# dd bs=4096 count=32768 if=/dev/zero of=./filename.test
> 32768+0 records in
> 32768+0 records out
> 134217728 bytes (134 MB) copied, 0.21915 s, 612 MB/s
> [root at ac33 jenos]#
>
> If doing anything that can see a cache benefit, the performance of Gluster
> can't compare. ?Is it even using cache?
>
> This is the client vol file I used for that test:
>
> [root at ac33 jenos]# cat /etc/glusterfs/ghome.vol
> #-----------IB remotes------------------
> volume ghome
> ?type protocol/client
> ?option transport-type tcp/client
> ?option remote-host ac33
> ?option remote-subvolume ibstripe
> end-volume
>
> #------------Performance Options-------------------
>
> volume readahead
> ?type performance/read-ahead
> ?option page-count 4 ? ? ? ? ? # 2 is default option
> ?option force-atime-update off # default is off
> ?subvolumes ghome
> end-volume
>
> volume writebehind
> ?type performance/write-behind
> ?option cache-size 1MB
> ?subvolumes readahead
> end-volume
>
> volume cache
> ?type performance/io-cache
> ?option cache-size 2GB
> ?subvolumes writebehind
> end-volume
>
>
> Any suggestions appreciated. ?thx-
>
> ? ?Jeremy
>
> On 3/26/2010 6:09 PM, Bryan Whitehead wrote:
>>
>> One more thought, looks like (from your emails) you are always running
>> the gluster test first. Maybe the tar file is being read from disk
>> when you do the gluster test, then being read from cache when you run
>> for the disk.
>>
>> What if you just pull a chunk of 0's off /dev/zero?
>>
>> dd bs=4096 count=32768 if=/dev/zero of=./filename.test
>>
>> or stick the tar in a ramdisk?
>>
>> (or run the benchmark 10 times for each, drop the best and the worse,
>> and average the remaining 8)
>>
>> Would also be curious if you add another node if the time would be
>> halved, then add another 2... then it would be halved again? I guess
>> that depends on if striping or just replicating is being used.
>> (unfortunately I don't have access to more than 1 test box right now).
>>
>> On Wed, Mar 24, 2010 at 11:06 PM, Jeremy Enos<jenos at ncsa.uiuc.edu> ?wrote:
>>
>>>
>>> For completeness:
>>>
>>> ##############GLUSTER SINGLE DISK NO PERFORMANCE OPTIONS##############
>>> [root at ac33 gjenos]# time (tar xzf
>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz&& ?sync )
>>>
>>> real ? ?0m41.052s
>>> user ? ?0m7.705s
>>> sys ? ? 0m3.122s
>>> ##############DIRECT SINGLE DISK##############
>>> [root at ac33 gjenos]# cd /export/jenos
>>> [root at ac33 jenos]# time (tar xzf
>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz&& ?sync )
>>>
>>> real ? ?0m22.093s
>>> user ? ?0m6.932s
>>> sys ? ? 0m2.459s
>>> [root at ac33 jenos]#
>>>
>>> The performance options don't appear to be the problem. ?So the question
>>> stands- how do I get the disk cache advantage through the Gluster mounted
>>> filesystem? ?It seems to be key in the large performance difference.
>>>
>>> ? ?Jeremy
>>>
>>> On 3/24/2010 4:47 PM, Jeremy Enos wrote:
>>>
>>>>
>>>> Good suggestion- I hadn't tried that yet. ?It brings them much closer.
>>>>
>>>> ##############GLUSTER SINGLE DISK##############
>>>> [root at ac33 gjenos]# time (tar xzf
>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz&& ?sync )
>>>>
>>>> real ? ?0m32.089s
>>>> user ? ?0m6.516s
>>>> sys ? ? 0m3.177s
>>>> ##############DIRECT SINGLE DISK##############
>>>> [root at ac33 gjenos]# cd /export/jenos/
>>>> [root at ac33 jenos]# time (tar xzf
>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz&& ?sync )
>>>>
>>>> real ? ?0m25.089s
>>>> user ? ?0m6.850s
>>>> sys ? ? 0m2.058s
>>>> ##############DIRECT SINGLE DISK CACHED##############
>>>> [root at ac33 jenos]# time (tar xzf
>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz )
>>>>
>>>> real ? ?0m8.955s
>>>> user ? ?0m6.785s
>>>> sys ? ? 0m1.848s
>>>>
>>>>
>>>> Oddly, I'm seeing better performance on the gluster system than previous
>>>> tests too (used to be ~39 s). ?The direct disk time is obviously
>>>> benefiting
>>>> from cache. ?There is still a difference, but it appears most of the
>>>> difference disappears w/ the cache advantage removed. ?That said- the
>>>> relative performance issue then still exists with Gluster. ?What can be
>>>> done
>>>> to make it benefit from cache the same way direct disk does?
>>>> thx-
>>>>
>>>> ? ?Jeremy
>>>>
>>>> P.S.
>>>> I'll be posting results w/ performance options completely removed from
>>>> gluster as soon as I get a chance.
>>>>
>>>> ? ?Jeremy
>>>>
>>>> On 3/24/2010 4:23 PM, Bryan Whitehead wrote:
>>>>
>>>>>
>>>>> I'd like to see results with this:
>>>>>
>>>>> time ( tar xzf /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz&&
>>>>> ?sync )
>>>>>
>>>>> I've found local filesystems seem to use cache very heavily. The
>>>>> untarred file could mostly be sitting in ram with local fs vs going
>>>>> though fuse (which might do many more sync'ed flushes to disk?).
>>>>>
>>>>> On Wed, Mar 24, 2010 at 2:25 AM, Jeremy Enos<jenos at ncsa.uiuc.edu>
>>>>> ?wrote:
>>>>>
>>>>>>
>>>>>> I also neglected to mention that the underlying filesystem is ext3.
>>>>>>
>>>>>> On 3/24/2010 3:44 AM, Jeremy Enos wrote:
>>>>>>
>>>>>>>
>>>>>>> I haven't tried all performance options disabled yet- I can try that
>>>>>>> tomorrow when the resource frees up. ?I was actually asking first
>>>>>>> before
>>>>>>> blindly trying different configuration matrices in case there's a
>>>>>>> clear
>>>>>>> direction I should take with it. ?I'll let you know.
>>>>>>>
>>>>>>> ? ?Jeremy
>>>>>>>
>>>>>>> On 3/24/2010 2:54 AM, Stephan von Krawczynski wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Hi Jeremy,
>>>>>>>>
>>>>>>>> have you tried to reproduce with all performance options disabled?
>>>>>>>> They
>>>>>>>> are
>>>>>>>> possibly no good idea on a local system.
>>>>>>>> What local fs do you use?
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Regards,
>>>>>>>> Stephan
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, 23 Mar 2010 19:11:28 -0500
>>>>>>>> Jeremy Enos<jenos at ncsa.uiuc.edu> ? ? ?wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Stephan is correct- I primarily did this test to show a
>>>>>>>>> demonstrable
>>>>>>>>> overhead example that I'm trying to eliminate. ?It's pronounced
>>>>>>>>> enough
>>>>>>>>> that it can be seen on a single disk / single node configuration,
>>>>>>>>> which
>>>>>>>>> is good in a way (so anyone can easily repro).
>>>>>>>>>
>>>>>>>>> My distributed/clustered solution would be ideal if it were fast
>>>>>>>>> enough
>>>>>>>>> for small block i/o as well as large block- I was hoping that
>>>>>>>>> single
>>>>>>>>> node systems would achieve that, hence the single node test.
>>>>>>>>> ?Because
>>>>>>>>> the single node test performed poorly, I eventually reduced down to
>>>>>>>>> single disk to see if it could still be seen, and it clearly can
>>>>>>>>> be.
>>>>>>>>> Perhaps it's something in my configuration? ?I've pasted my config
>>>>>>>>> files
>>>>>>>>> below.
>>>>>>>>> thx-
>>>>>>>>>
>>>>>>>>> ? ? ?Jeremy
>>>>>>>>>
>>>>>>>>> ######################glusterfsd.vol######################
>>>>>>>>> volume posix
>>>>>>>>> ? ?type storage/posix
>>>>>>>>> ? ?option directory /export
>>>>>>>>> end-volume
>>>>>>>>>
>>>>>>>>> volume locks
>>>>>>>>> ? ?type features/locks
>>>>>>>>> ? ?subvolumes posix
>>>>>>>>> end-volume
>>>>>>>>>
>>>>>>>>> volume disk
>>>>>>>>> ? ?type performance/io-threads
>>>>>>>>> ? ?option thread-count 4
>>>>>>>>> ? ?subvolumes locks
>>>>>>>>> end-volume
>>>>>>>>>
>>>>>>>>> volume server-ib
>>>>>>>>> ? ?type protocol/server
>>>>>>>>> ? ?option transport-type ib-verbs/server
>>>>>>>>> ? ?option auth.addr.disk.allow *
>>>>>>>>> ? ?subvolumes disk
>>>>>>>>> end-volume
>>>>>>>>>
>>>>>>>>> volume server-tcp
>>>>>>>>> ? ?type protocol/server
>>>>>>>>> ? ?option transport-type tcp/server
>>>>>>>>> ? ?option auth.addr.disk.allow *
>>>>>>>>> ? ?subvolumes disk
>>>>>>>>> end-volume
>>>>>>>>>
>>>>>>>>> ######################ghome.vol######################
>>>>>>>>>
>>>>>>>>> #-----------IB remotes------------------
>>>>>>>>> volume ghome
>>>>>>>>> ? ?type protocol/client
>>>>>>>>> ? ?option transport-type ib-verbs/client
>>>>>>>>> # ?option transport-type tcp/client
>>>>>>>>> ? ?option remote-host acfs
>>>>>>>>> ? ?option remote-subvolume raid
>>>>>>>>> end-volume
>>>>>>>>>
>>>>>>>>> #------------Performance Options-------------------
>>>>>>>>>
>>>>>>>>> volume readahead
>>>>>>>>> ? ?type performance/read-ahead
>>>>>>>>> ? ?option page-count 4 ? ? ? ? ? # 2 is default option
>>>>>>>>> ? ?option force-atime-update off # default is off
>>>>>>>>> ? ?subvolumes ghome
>>>>>>>>> end-volume
>>>>>>>>>
>>>>>>>>> volume writebehind
>>>>>>>>> ? ?type performance/write-behind
>>>>>>>>> ? ?option cache-size 1MB
>>>>>>>>> ? ?subvolumes readahead
>>>>>>>>> end-volume
>>>>>>>>>
>>>>>>>>> volume cache
>>>>>>>>> ? ?type performance/io-cache
>>>>>>>>> ? ?option cache-size 1GB
>>>>>>>>> ? ?subvolumes writebehind
>>>>>>>>> end-volume
>>>>>>>>>
>>>>>>>>> ######################END######################
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 3/23/2010 6:02 AM, Stephan von Krawczynski wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, 23 Mar 2010 02:59:35 -0600 (CST)
>>>>>>>>>> "Tejas N. Bhise"<tejas at gluster.com> ? ? ? wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Out of curiosity, if you want to do stuff only on one machine,
>>>>>>>>>>> why do you want to use a distributed, multi node, clustered,
>>>>>>>>>>> file system ?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Because what he does is a very good way to show the overhead
>>>>>>>>>> produced
>>>>>>>>>> only by
>>>>>>>>>> glusterfs and nothing else (i.e. no network involved).
>>>>>>>>>> A pretty relevant test scenario I would say.
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Regards,
>>>>>>>>>> Stephan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Am I missing something here ?
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Tejas.
>>>>>>>>>>>
>>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>> From: "Jeremy Enos"<jenos at ncsa.uiuc.edu>
>>>>>>>>>>> To: gluster-users at gluster.org
>>>>>>>>>>> Sent: Tuesday, March 23, 2010 2:07:06 PM GMT +05:30 Chennai,
>>>>>>>>>>> Kolkata,
>>>>>>>>>>> Mumbai, New Delhi
>>>>>>>>>>> Subject: gluster local vs local = gluster x4
>>>>>>>>>>> slower
>>>>>>>>>>>
>>>>>>>>>>> This test is pretty easy to replicate anywhere- only takes 1
>>>>>>>>>>> disk,
>>>>>>>>>>> one
>>>>>>>>>>> machine, one tarball. ?Untarring to local disk directly vs thru
>>>>>>>>>>> gluster
>>>>>>>>>>> is about 4.5x faster. ?At first I thought this may be due to a
>>>>>>>>>>> slow
>>>>>>>>>>> host
>>>>>>>>>>> (Opteron 2.4ghz). ?But it's not- same configuration, on a much
>>>>>>>>>>> faster
>>>>>>>>>>> machine (dual 3.33ghz Xeon) yields the performance below.
>>>>>>>>>>>
>>>>>>>>>>> ####THIS TEST WAS TO A LOCAL DISK THRU GLUSTER####
>>>>>>>>>>> [root at ac33 jenos]# time tar xzf
>>>>>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz
>>>>>>>>>>>
>>>>>>>>>>> real ? ?0m41.290s
>>>>>>>>>>> user ? ?0m14.246s
>>>>>>>>>>> sys ? ? 0m2.957s
>>>>>>>>>>>
>>>>>>>>>>> ####THIS TEST WAS TO A LOCAL DISK (BYPASS GLUSTER)####
>>>>>>>>>>> [root at ac33 jenos]# cd /export/jenos/
>>>>>>>>>>> [root at ac33 jenos]# time tar xzf
>>>>>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz
>>>>>>>>>>>
>>>>>>>>>>> real ? ?0m8.983s
>>>>>>>>>>> user ? ?0m6.857s
>>>>>>>>>>> sys ? ? 0m1.844s
>>>>>>>>>>>
>>>>>>>>>>> ####THESE ARE TEST FILE DETAILS####
>>>>>>>>>>> [root at ac33 jenos]# tar tzvf
>>>>>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz ?|wc -l
>>>>>>>>>>> 109
>>>>>>>>>>> [root at ac33 jenos]# ls -l
>>>>>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz
>>>>>>>>>>> -rw-r--r-- 1 jenos ac 804385203 2010-02-07 06:32
>>>>>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz
>>>>>>>>>>> [root at ac33 jenos]#
>>>>>>>>>>>
>>>>>>>>>>> These are the relevant performance options I'm using in my .vol
>>>>>>>>>>> file:
>>>>>>>>>>>
>>>>>>>>>>> #------------Performance Options-------------------
>>>>>>>>>>>
>>>>>>>>>>> volume readahead
>>>>>>>>>>> ? ? type performance/read-ahead
>>>>>>>>>>> ? ? option page-count 4 ? ? ? ? ? # 2 is default option
>>>>>>>>>>> ? ? option force-atime-update off # default is off
>>>>>>>>>>> ? ? subvolumes ghome
>>>>>>>>>>> end-volume
>>>>>>>>>>>
>>>>>>>>>>> volume writebehind
>>>>>>>>>>> ? ? type performance/write-behind
>>>>>>>>>>> ? ? option cache-size 1MB
>>>>>>>>>>> ? ? subvolumes readahead
>>>>>>>>>>> end-volume
>>>>>>>>>>>
>>>>>>>>>>> volume cache
>>>>>>>>>>> ? ? type performance/io-cache
>>>>>>>>>>> ? ? option cache-size 1GB
>>>>>>>>>>> ? ? subvolumes writebehind
>>>>>>>>>>> end-volume
>>>>>>>>>>>
>>>>>>>>>>> What can I do to improve gluster's performance?
>>>>>>>>>>>
>>>>>>>>>>> ? ? ? Jeremy
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org
>>>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>>>>
>>>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>>
>>>>
>>>
>>>
>>
>>
>