Hi Mohit, as I handed it over the distributed FIO based test tool, a colleague took over tuning these parameters. Seems like we can not base the Gluster on such low performance EBS drives and are going to investigate other options - ram drives on extra large instances. We don't need terrabytes, just couple dozens gigabytes of storage. Thanks for help - I guess this whole thread will save some time to others wanting to experiment with Gluster on EBS. Karol On Sat, Mar 26, 2011 at 1:00 AM, Mohit Anchlia <mohitanchlia at gmail.com> wrote: > I ran some dd test locally and see a big diferrence when using > odirect. I have 10k rpm SAS drive and claim to have seek time of 3ms. > So I don't understand this behaviour. > > dd if=/dev/zero of=/root/junk bs=128k count=8000 oflag=direct > 8000+0 records in > 8000+0 records out > 1048576000 bytes (1.0 GB) copied, 58.8426 seconds, 17.8 MB/s > > dd if=/dev/zero of=/root/junk bs=128k count=8000 > 8000+0 records in > 8000+0 records out > 1048576000 bytes (1.0 GB) copied, 1.22749 seconds, 854 MB/s > > Can dev team look at my numbers with given config and also Karols > data? I expect much higher rate. > > gluster seriously lacks in documents and completely leaves everyone in > confused state :) Not sure how to deal with that since there is no > commercial support unless you use vmware 4.1. So I guess if it doesn't > work then look for some other technology. But all the claims I see > about performance makes me feel if we had little better info on > performance tuning would help tremenduously. > > On Fri, Mar 25, 2011 at 2:16 PM, Mohit Anchlia <mohitanchlia at gmail.com> wrote: >> It will be good for dev team to look at it in parallel. It will help others too. >> >> First thing that I see is that your network bandwidth sucks. Is it >> 1GigE? When you run tools like iperf you atleast expect to see close >> to 800MB/s. for eg: in my env if I run iperf I get something like: >> >> ------------------------------------------------------------ >> TCP window size: 16.0 KByte (default) >> ------------------------------------------------------------ >> [ ?6] local 10.1.101.193 port 49503 connected with 10.1.101.149 port 5001 >> [ ?4] ?0.0-10.0 sec ? ?975 MBytes ? ?815 Mbits/sec >> [ ?5] local 10.1.101.193 port 5001 connected with 10.1.101.149 port 41642 >> >> Can you also try another dd test directly on the gluster server where >> volume is and post the results? >> >> Regarding other perf related questions I haven't myself tries those >> yet so I think you will need to change one at a time and expirement >> with it. But if there is a inherent perf problem with the server and >> underlying storage then those may not be that helpful. >> >> On Thu, Mar 24, 2011 at 3:55 AM, karol skocik <karol.skocik at gmail.com> wrote: >>> Hi Vikas, Mohit, >>> ?I should disclose our typical use cases: >>> We need to read and write files of size several 100s of MBs - the >>> ratio of read : write is about 1:1. >>> >>>> What did you use to calculate latency? >>> >>> I used http://www.bitmover.com/lmbench they have a tool "lat_tcp". >>> >>> Numbers below are from lmbench tool "bw_tcp": >>> >>>> Network bandwidths: >>>> dfs01: 54 MB/s >>>> dfs02: 62.5 MB/s >>>> dfs03: 64 MB/s >>>> dfs04: 91.5 MB/s >>> >>> The setup is Gluster native, no NFS. >>> >>> About the "Optimizing Gluster" link - I have seen it before, but there >>> are several things I don't understand: >>> >>> 1.) Tuning FUSE to use larger blocksize - when testing PVFS, we >>> achieved best performance with bs = 4MB. >>> It's hard to understand why it's hardcoded to 128 KB. >>> Also I have read somewhere else (referencing FUSE) - that larger >>> blocksize doesn't yield more performance. >>> I guess when transfering larger amount of data on network with >>> significant latency, >>> a lot less IO requests should result in higher throughput. (And it's >>> cheaper also on EBS). >>> >>> Are those listed adjustments to FUSE kernel modules still applicable? >>> >>> 2.) Enabling direct-io mode >>> Does this work on current 3.1.2? : >>> >>> glusterfs --direct-io-mode=write-only -f <spec-file> <mount-point> >>> >>> also with --direct-io-mode=read-write ? >>> >>> Of those parameters in "Setting Volume Options", could this one help: >>> - performance.write-behind-window-size - increasing 10-20 times? >>> >>> Now, the raw block device throughput (dd if=/dev/zero >>> of=/path/to/ebs/mount bs=128k count=4096 oflag=direct) >>> 3 measurements on server machines dfs0[1-4]: >>> >>> dfs01: 9.0 MB/s, 16.4 MB/s, 18.4 MB/s >>> dfs02: 26.0 MB/s, 28.5 MB/s, 13.0 MB/s >>> dfs03: 14.4 MB/s, 11.8 MB/s, 32.6 MB/s >>> dfs04: 35.5 MB/s, 33.1 MB/s, 31.9 MB/s >>> >>> This, indeed, varies considerably! >>> >>> Thanks for help. >>> Karol >>> >>> >>> On Wed, Mar 23, 2011 at 7:06 PM, Vikas Gorur <vikas at gluster.com> wrote: >>>> Karol, >>>> >>>> A few general pointers about EBS performance: >>>> >>>> We've seen throughput to an EBS volume vary considerably. Since EBS is iSCSI underneath, throughput to a volume can fluctuate, and it is also possible that your instance is on degraded hardware that gets very low throughput to the volume. >>>> >>>> So I would advise you to first gather some data about all your EBS volumes. You can measure throughput to them by doing something like: >>>> >>>> dd if=/dev/zero of=/path/to/ebs/mount bs=128k count=4096 oflag=direct >>>> >>>> The "oflag=direct" will give us the raw block device throughput, without the kernel cache in the way. >>>> >>>> The performance you see on the Gluster mountpoint will be a function of the EBS performance. You might also want to spin up a couple more instances and see their EBS throughput to get an idea of the range of EBS performance. >>>> >>>> Doing a RAID0 of 4 or 8 EBS volumes using mdadm will also help you increase performance. >>>> >>>> ------------------------------ >>>> Vikas Gorur >>>> Engineer - Gluster, Inc. >>>> ------------------------------ >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >> >