That too, is what I'm guessing is happening. Besides official confirmation of what's going on, I'm mainly just after an answer as to if there is a way to solve it, and make a locally mounted single disk Gluster fs perform even close to as well as a single local disk directly, including for cached transactions. So far, the performance translators have had little impact in making small block i/o performance competitive. thx- Jeremy On 3/30/2010 11:33 AM, Steven Truelove wrote: > What you are likely seeing is the OS saving dirty pages in the disk > cache before writing them. If you were untarring a file that was > significantly larger than available memory on the server, the server > would be forced to write to disk and you would likely see performance > fall more into line with the results you get when you call sync. > > Gluster is probably flushing data to disk more aggressively than the > OS would on its own. This may be intended for reducing the loss of > data in server failure scenarios. Someone on the Gluster team can > probably comment on any settings that may exist for controlling > Gluster's data flushing behaviour. > > Steven Truelove > > > On 29/03/2010 5:09 PM, Jeremy Enos wrote: >> I've already determined that && sync brings the values at least to >> the same order (gluster is about 75% of direct disk there). I could >> accept that for the benefit of having a parallel fileystem. >> What I'm actually trying to achieve now is exactly what leaving out >> the && sync yields in perceived performance, which translates to real >> performance if the user can continue on to another task instead of >> blocking because Gluster isn't utilizing cache. How, with Gluster, >> can I achieve the same cache benefit that direct disk gets? Will a >> user ever be able to untar a moderately sized (below physical memory) >> file on to a Gluster filesystem as fast as to a single disk? (as I >> did in my initial comparison) Is there something fundamentally >> preventing that in Gluster's design, or am I misconfiguring it? >> thx- >> >> Jeremy >> >> On 3/29/2010 2:00 PM, Bryan Whitehead wrote: >>> heh, don't forget the&& sync >>> >>> :) >>> >>> On Mon, Mar 29, 2010 at 11:21 AM, Jeremy Enos<jenos at ncsa.uiuc.edu> >>> wrote: >>>> Got a chance to run your suggested test: >>>> >>>> ##############GLUSTER SINGLE DISK############## >>>> >>>> [root at ac33 gjenos]# dd bs=4096 count=32768 if=/dev/zero >>>> of=./filename.test >>>> 32768+0 records in >>>> 32768+0 records out >>>> 134217728 bytes (134 MB) copied, 8.60486 s, 15.6 MB/s >>>> [root at ac33 gjenos]# >>>> [root at ac33 gjenos]# cd /export/jenos/ >>>> >>>> ##############DIRECT SINGLE DISK############## >>>> >>>> [root at ac33 jenos]# dd bs=4096 count=32768 if=/dev/zero >>>> of=./filename.test >>>> 32768+0 records in >>>> 32768+0 records out >>>> 134217728 bytes (134 MB) copied, 0.21915 s, 612 MB/s >>>> [root at ac33 jenos]# >>>> >>>> If doing anything that can see a cache benefit, the performance of >>>> Gluster >>>> can't compare. Is it even using cache? >>>> >>>> This is the client vol file I used for that test: >>>> >>>> [root at ac33 jenos]# cat /etc/glusterfs/ghome.vol >>>> #-----------IB remotes------------------ >>>> volume ghome >>>> type protocol/client >>>> option transport-type tcp/client >>>> option remote-host ac33 >>>> option remote-subvolume ibstripe >>>> end-volume >>>> >>>> #------------Performance Options------------------- >>>> >>>> volume readahead >>>> type performance/read-ahead >>>> option page-count 4 # 2 is default option >>>> option force-atime-update off # default is off >>>> subvolumes ghome >>>> end-volume >>>> >>>> volume writebehind >>>> type performance/write-behind >>>> option cache-size 1MB >>>> subvolumes readahead >>>> end-volume >>>> >>>> volume cache >>>> type performance/io-cache >>>> option cache-size 2GB >>>> subvolumes writebehind >>>> end-volume >>>> >>>> >>>> Any suggestions appreciated. thx- >>>> >>>> Jeremy >>>> >>>> On 3/26/2010 6:09 PM, Bryan Whitehead wrote: >>>>> One more thought, looks like (from your emails) you are always >>>>> running >>>>> the gluster test first. Maybe the tar file is being read from disk >>>>> when you do the gluster test, then being read from cache when you run >>>>> for the disk. >>>>> >>>>> What if you just pull a chunk of 0's off /dev/zero? >>>>> >>>>> dd bs=4096 count=32768 if=/dev/zero of=./filename.test >>>>> >>>>> or stick the tar in a ramdisk? >>>>> >>>>> (or run the benchmark 10 times for each, drop the best and the worse, >>>>> and average the remaining 8) >>>>> >>>>> Would also be curious if you add another node if the time would be >>>>> halved, then add another 2... then it would be halved again? I guess >>>>> that depends on if striping or just replicating is being used. >>>>> (unfortunately I don't have access to more than 1 test box right >>>>> now). >>>>> >>>>> On Wed, Mar 24, 2010 at 11:06 PM, Jeremy >>>>> Enos<jenos at ncsa.uiuc.edu> wrote: >>>>> >>>>>> For completeness: >>>>>> >>>>>> ##############GLUSTER SINGLE DISK NO PERFORMANCE >>>>>> OPTIONS############## >>>>>> [root at ac33 gjenos]# time (tar xzf >>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz&& sync ) >>>>>> >>>>>> real 0m41.052s >>>>>> user 0m7.705s >>>>>> sys 0m3.122s >>>>>> ##############DIRECT SINGLE DISK############## >>>>>> [root at ac33 gjenos]# cd /export/jenos >>>>>> [root at ac33 jenos]# time (tar xzf >>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz&& sync ) >>>>>> >>>>>> real 0m22.093s >>>>>> user 0m6.932s >>>>>> sys 0m2.459s >>>>>> [root at ac33 jenos]# >>>>>> >>>>>> The performance options don't appear to be the problem. So the >>>>>> question >>>>>> stands- how do I get the disk cache advantage through the Gluster >>>>>> mounted >>>>>> filesystem? It seems to be key in the large performance difference. >>>>>> >>>>>> Jeremy >>>>>> >>>>>> On 3/24/2010 4:47 PM, Jeremy Enos wrote: >>>>>> >>>>>>> Good suggestion- I hadn't tried that yet. It brings them much >>>>>>> closer. >>>>>>> >>>>>>> ##############GLUSTER SINGLE DISK############## >>>>>>> [root at ac33 gjenos]# time (tar xzf >>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz&& sync ) >>>>>>> >>>>>>> real 0m32.089s >>>>>>> user 0m6.516s >>>>>>> sys 0m3.177s >>>>>>> ##############DIRECT SINGLE DISK############## >>>>>>> [root at ac33 gjenos]# cd /export/jenos/ >>>>>>> [root at ac33 jenos]# time (tar xzf >>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz&& sync ) >>>>>>> >>>>>>> real 0m25.089s >>>>>>> user 0m6.850s >>>>>>> sys 0m2.058s >>>>>>> ##############DIRECT SINGLE DISK CACHED############## >>>>>>> [root at ac33 jenos]# time (tar xzf >>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz ) >>>>>>> >>>>>>> real 0m8.955s >>>>>>> user 0m6.785s >>>>>>> sys 0m1.848s >>>>>>> >>>>>>> >>>>>>> Oddly, I'm seeing better performance on the gluster system than >>>>>>> previous >>>>>>> tests too (used to be ~39 s). The direct disk time is obviously >>>>>>> benefiting >>>>>>> from cache. There is still a difference, but it appears most of >>>>>>> the >>>>>>> difference disappears w/ the cache advantage removed. That >>>>>>> said- the >>>>>>> relative performance issue then still exists with Gluster. What >>>>>>> can be >>>>>>> done >>>>>>> to make it benefit from cache the same way direct disk does? >>>>>>> thx- >>>>>>> >>>>>>> Jeremy >>>>>>> >>>>>>> P.S. >>>>>>> I'll be posting results w/ performance options completely >>>>>>> removed from >>>>>>> gluster as soon as I get a chance. >>>>>>> >>>>>>> Jeremy >>>>>>> >>>>>>> On 3/24/2010 4:23 PM, Bryan Whitehead wrote: >>>>>>> >>>>>>>> I'd like to see results with this: >>>>>>>> >>>>>>>> time ( tar xzf >>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz&& >>>>>>>> sync ) >>>>>>>> >>>>>>>> I've found local filesystems seem to use cache very heavily. The >>>>>>>> untarred file could mostly be sitting in ram with local fs vs >>>>>>>> going >>>>>>>> though fuse (which might do many more sync'ed flushes to disk?). >>>>>>>> >>>>>>>> On Wed, Mar 24, 2010 at 2:25 AM, Jeremy Enos<jenos at ncsa.uiuc.edu> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> I also neglected to mention that the underlying filesystem is >>>>>>>>> ext3. >>>>>>>>> >>>>>>>>> On 3/24/2010 3:44 AM, Jeremy Enos wrote: >>>>>>>>> >>>>>>>>>> I haven't tried all performance options disabled yet- I can >>>>>>>>>> try that >>>>>>>>>> tomorrow when the resource frees up. I was actually asking >>>>>>>>>> first >>>>>>>>>> before >>>>>>>>>> blindly trying different configuration matrices in case >>>>>>>>>> there's a >>>>>>>>>> clear >>>>>>>>>> direction I should take with it. I'll let you know. >>>>>>>>>> >>>>>>>>>> Jeremy >>>>>>>>>> >>>>>>>>>> On 3/24/2010 2:54 AM, Stephan von Krawczynski wrote: >>>>>>>>>> >>>>>>>>>>> Hi Jeremy, >>>>>>>>>>> >>>>>>>>>>> have you tried to reproduce with all performance options >>>>>>>>>>> disabled? >>>>>>>>>>> They >>>>>>>>>>> are >>>>>>>>>>> possibly no good idea on a local system. >>>>>>>>>>> What local fs do you use? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Regards, >>>>>>>>>>> Stephan >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, 23 Mar 2010 19:11:28 -0500 >>>>>>>>>>> Jeremy Enos<jenos at ncsa.uiuc.edu> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Stephan is correct- I primarily did this test to show a >>>>>>>>>>>> demonstrable >>>>>>>>>>>> overhead example that I'm trying to eliminate. It's >>>>>>>>>>>> pronounced >>>>>>>>>>>> enough >>>>>>>>>>>> that it can be seen on a single disk / single node >>>>>>>>>>>> configuration, >>>>>>>>>>>> which >>>>>>>>>>>> is good in a way (so anyone can easily repro). >>>>>>>>>>>> >>>>>>>>>>>> My distributed/clustered solution would be ideal if it were >>>>>>>>>>>> fast >>>>>>>>>>>> enough >>>>>>>>>>>> for small block i/o as well as large block- I was hoping that >>>>>>>>>>>> single >>>>>>>>>>>> node systems would achieve that, hence the single node test. >>>>>>>>>>>> Because >>>>>>>>>>>> the single node test performed poorly, I eventually reduced >>>>>>>>>>>> down to >>>>>>>>>>>> single disk to see if it could still be seen, and it >>>>>>>>>>>> clearly can >>>>>>>>>>>> be. >>>>>>>>>>>> Perhaps it's something in my configuration? I've pasted my >>>>>>>>>>>> config >>>>>>>>>>>> files >>>>>>>>>>>> below. >>>>>>>>>>>> thx- >>>>>>>>>>>> >>>>>>>>>>>> Jeremy >>>>>>>>>>>> >>>>>>>>>>>> ######################glusterfsd.vol###################### >>>>>>>>>>>> volume posix >>>>>>>>>>>> type storage/posix >>>>>>>>>>>> option directory /export >>>>>>>>>>>> end-volume >>>>>>>>>>>> >>>>>>>>>>>> volume locks >>>>>>>>>>>> type features/locks >>>>>>>>>>>> subvolumes posix >>>>>>>>>>>> end-volume >>>>>>>>>>>> >>>>>>>>>>>> volume disk >>>>>>>>>>>> type performance/io-threads >>>>>>>>>>>> option thread-count 4 >>>>>>>>>>>> subvolumes locks >>>>>>>>>>>> end-volume >>>>>>>>>>>> >>>>>>>>>>>> volume server-ib >>>>>>>>>>>> type protocol/server >>>>>>>>>>>> option transport-type ib-verbs/server >>>>>>>>>>>> option auth.addr.disk.allow * >>>>>>>>>>>> subvolumes disk >>>>>>>>>>>> end-volume >>>>>>>>>>>> >>>>>>>>>>>> volume server-tcp >>>>>>>>>>>> type protocol/server >>>>>>>>>>>> option transport-type tcp/server >>>>>>>>>>>> option auth.addr.disk.allow * >>>>>>>>>>>> subvolumes disk >>>>>>>>>>>> end-volume >>>>>>>>>>>> >>>>>>>>>>>> ######################ghome.vol###################### >>>>>>>>>>>> >>>>>>>>>>>> #-----------IB remotes------------------ >>>>>>>>>>>> volume ghome >>>>>>>>>>>> type protocol/client >>>>>>>>>>>> option transport-type ib-verbs/client >>>>>>>>>>>> # option transport-type tcp/client >>>>>>>>>>>> option remote-host acfs >>>>>>>>>>>> option remote-subvolume raid >>>>>>>>>>>> end-volume >>>>>>>>>>>> >>>>>>>>>>>> #------------Performance Options------------------- >>>>>>>>>>>> >>>>>>>>>>>> volume readahead >>>>>>>>>>>> type performance/read-ahead >>>>>>>>>>>> option page-count 4 # 2 is default option >>>>>>>>>>>> option force-atime-update off # default is off >>>>>>>>>>>> subvolumes ghome >>>>>>>>>>>> end-volume >>>>>>>>>>>> >>>>>>>>>>>> volume writebehind >>>>>>>>>>>> type performance/write-behind >>>>>>>>>>>> option cache-size 1MB >>>>>>>>>>>> subvolumes readahead >>>>>>>>>>>> end-volume >>>>>>>>>>>> >>>>>>>>>>>> volume cache >>>>>>>>>>>> type performance/io-cache >>>>>>>>>>>> option cache-size 1GB >>>>>>>>>>>> subvolumes writebehind >>>>>>>>>>>> end-volume >>>>>>>>>>>> >>>>>>>>>>>> ######################END###################### >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 3/23/2010 6:02 AM, Stephan von Krawczynski wrote: >>>>>>>>>>>> >>>>>>>>>>>>> On Tue, 23 Mar 2010 02:59:35 -0600 (CST) >>>>>>>>>>>>> "Tejas N. Bhise"<tejas at gluster.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Out of curiosity, if you want to do stuff only on one >>>>>>>>>>>>>> machine, >>>>>>>>>>>>>> why do you want to use a distributed, multi node, clustered, >>>>>>>>>>>>>> file system ? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> Because what he does is a very good way to show the overhead >>>>>>>>>>>>> produced >>>>>>>>>>>>> only by >>>>>>>>>>>>> glusterfs and nothing else (i.e. no network involved). >>>>>>>>>>>>> A pretty relevant test scenario I would say. >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Regards, >>>>>>>>>>>>> Stephan >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Am I missing something here ? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>> Tejas. >>>>>>>>>>>>>> >>>>>>>>>>>>>> ----- Original Message ----- >>>>>>>>>>>>>> From: "Jeremy Enos"<jenos at ncsa.uiuc.edu> >>>>>>>>>>>>>> To: gluster-users at gluster.org >>>>>>>>>>>>>> Sent: Tuesday, March 23, 2010 2:07:06 PM GMT +05:30 Chennai, >>>>>>>>>>>>>> Kolkata, >>>>>>>>>>>>>> Mumbai, New Delhi >>>>>>>>>>>>>> Subject: gluster local vs local = gluster x4 >>>>>>>>>>>>>> slower >>>>>>>>>>>>>> >>>>>>>>>>>>>> This test is pretty easy to replicate anywhere- only takes 1 >>>>>>>>>>>>>> disk, >>>>>>>>>>>>>> one >>>>>>>>>>>>>> machine, one tarball. Untarring to local disk directly >>>>>>>>>>>>>> vs thru >>>>>>>>>>>>>> gluster >>>>>>>>>>>>>> is about 4.5x faster. At first I thought this may be due >>>>>>>>>>>>>> to a >>>>>>>>>>>>>> slow >>>>>>>>>>>>>> host >>>>>>>>>>>>>> (Opteron 2.4ghz). But it's not- same configuration, on a >>>>>>>>>>>>>> much >>>>>>>>>>>>>> faster >>>>>>>>>>>>>> machine (dual 3.33ghz Xeon) yields the performance below. >>>>>>>>>>>>>> >>>>>>>>>>>>>> ####THIS TEST WAS TO A LOCAL DISK THRU GLUSTER#### >>>>>>>>>>>>>> [root at ac33 jenos]# time tar xzf >>>>>>>>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz >>>>>>>>>>>>>> >>>>>>>>>>>>>> real 0m41.290s >>>>>>>>>>>>>> user 0m14.246s >>>>>>>>>>>>>> sys 0m2.957s >>>>>>>>>>>>>> >>>>>>>>>>>>>> ####THIS TEST WAS TO A LOCAL DISK (BYPASS GLUSTER)#### >>>>>>>>>>>>>> [root at ac33 jenos]# cd /export/jenos/ >>>>>>>>>>>>>> [root at ac33 jenos]# time tar xzf >>>>>>>>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz >>>>>>>>>>>>>> >>>>>>>>>>>>>> real 0m8.983s >>>>>>>>>>>>>> user 0m6.857s >>>>>>>>>>>>>> sys 0m1.844s >>>>>>>>>>>>>> >>>>>>>>>>>>>> ####THESE ARE TEST FILE DETAILS#### >>>>>>>>>>>>>> [root at ac33 jenos]# tar tzvf >>>>>>>>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz |wc -l >>>>>>>>>>>>>> 109 >>>>>>>>>>>>>> [root at ac33 jenos]# ls -l >>>>>>>>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz >>>>>>>>>>>>>> -rw-r--r-- 1 jenos ac 804385203 2010-02-07 06:32 >>>>>>>>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz >>>>>>>>>>>>>> [root at ac33 jenos]# >>>>>>>>>>>>>> >>>>>>>>>>>>>> These are the relevant performance options I'm using in >>>>>>>>>>>>>> my .vol >>>>>>>>>>>>>> file: >>>>>>>>>>>>>> >>>>>>>>>>>>>> #------------Performance Options------------------- >>>>>>>>>>>>>> >>>>>>>>>>>>>> volume readahead >>>>>>>>>>>>>> type performance/read-ahead >>>>>>>>>>>>>> option page-count 4 # 2 is default option >>>>>>>>>>>>>> option force-atime-update off # default is off >>>>>>>>>>>>>> subvolumes ghome >>>>>>>>>>>>>> end-volume >>>>>>>>>>>>>> >>>>>>>>>>>>>> volume writebehind >>>>>>>>>>>>>> type performance/write-behind >>>>>>>>>>>>>> option cache-size 1MB >>>>>>>>>>>>>> subvolumes readahead >>>>>>>>>>>>>> end-volume >>>>>>>>>>>>>> >>>>>>>>>>>>>> volume cache >>>>>>>>>>>>>> type performance/io-cache >>>>>>>>>>>>>> option cache-size 1GB >>>>>>>>>>>>>> subvolumes writebehind >>>>>>>>>>>>>> end-volume >>>>>>>>>>>>>> >>>>>>>>>>>>>> What can I do to improve gluster's performance? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Jeremy >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> Gluster-users mailing list >>>>>>>>>>>>>> Gluster-users at gluster.org >>>>>>>>>>>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> Gluster-users mailing list >>>>>>>>>>>>>> Gluster-users at gluster.org >>>>>>>>>>>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Gluster-users mailing list >>>>>>>>> Gluster-users at gluster.org >>>>>>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >>>>>>>>> >>>>>>>>> >>>>>>> _______________________________________________ >>>>>>> Gluster-users mailing list >>>>>>> Gluster-users at gluster.org >>>>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >>>>>>> >>>>>>> >>>>>> >>>>> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >> >