small write speed problem on EBS, distributed replica

mohitanchlia at gmail.com (Mohit Anchlia) · Fri, 25 Mar 2011 17:00:38 -0700

I ran some dd test locally and see a big diferrence when using
odirect. I have 10k rpm SAS drive and claim to have seek time of 3ms.
So I don't understand this behaviour.

dd if=/dev/zero of=/root/junk bs=128k count=8000 oflag=direct
8000+0 records in
8000+0 records out
1048576000 bytes (1.0 GB) copied, 58.8426 seconds, 17.8 MB/s

dd if=/dev/zero of=/root/junk bs=128k count=8000
8000+0 records in
8000+0 records out
1048576000 bytes (1.0 GB) copied, 1.22749 seconds, 854 MB/s

Can dev team look at my numbers with given config and also Karols
data? I expect much higher rate.

gluster seriously lacks in documents and completely leaves everyone in
confused state :) Not sure how to deal with that since there is no
commercial support unless you use vmware 4.1. So I guess if it doesn't
work then look for some other technology. But all the claims I see
about performance makes me feel if we had little better info on
performance tuning would help tremenduously.

On Fri, Mar 25, 2011 at 2:16 PM, Mohit Anchlia <mohitanchlia at gmail.com> wrote:
> It will be good for dev team to look at it in parallel. It will help others too.
>
> First thing that I see is that your network bandwidth sucks. Is it
> 1GigE? When you run tools like iperf you atleast expect to see close
> to 800MB/s. for eg: in my env if I run iperf I get something like:
>
> ------------------------------------------------------------
> TCP window size: 16.0 KByte (default)
> ------------------------------------------------------------
> [ ?6] local 10.1.101.193 port 49503 connected with 10.1.101.149 port 5001
> [ ?4] ?0.0-10.0 sec ? ?975 MBytes ? ?815 Mbits/sec
> [ ?5] local 10.1.101.193 port 5001 connected with 10.1.101.149 port 41642
>
> Can you also try another dd test directly on the gluster server where
> volume is and post the results?
>
> Regarding other perf related questions I haven't myself tries those
> yet so I think you will need to change one at a time and expirement
> with it. But if there is a inherent perf problem with the server and
> underlying storage then those may not be that helpful.
>
> On Thu, Mar 24, 2011 at 3:55 AM, karol skocik <karol.skocik at gmail.com> wrote:
>> Hi Vikas, Mohit,
>> ?I should disclose our typical use cases:
>> We need to read and write files of size several 100s of MBs - the
>> ratio of read : write is about 1:1.
>>
>>> What did you use to calculate latency?
>>
>> I used http://www.bitmover.com/lmbench they have a tool "lat_tcp".
>>
>> Numbers below are from lmbench tool "bw_tcp":
>>
>>> Network bandwidths:
>>> dfs01: 54 MB/s
>>> dfs02: 62.5 MB/s
>>> dfs03: 64 MB/s
>>> dfs04: 91.5 MB/s
>>
>> The setup is Gluster native, no NFS.
>>
>> About the "Optimizing Gluster" link - I have seen it before, but there
>> are several things I don't understand:
>>
>> 1.) Tuning FUSE to use larger blocksize - when testing PVFS, we
>> achieved best performance with bs = 4MB.
>> It's hard to understand why it's hardcoded to 128 KB.
>> Also I have read somewhere else (referencing FUSE) - that larger
>> blocksize doesn't yield more performance.
>> I guess when transfering larger amount of data on network with
>> significant latency,
>> a lot less IO requests should result in higher throughput. (And it's
>> cheaper also on EBS).
>>
>> Are those listed adjustments to FUSE kernel modules still applicable?
>>
>> 2.) Enabling direct-io mode
>> Does this work on current 3.1.2? :
>>
>> glusterfs --direct-io-mode=write-only -f <spec-file> <mount-point>
>>
>> also with --direct-io-mode=read-write ?
>>
>> Of those parameters in "Setting Volume Options", could this one help:
>> - performance.write-behind-window-size - increasing 10-20 times?
>>
>> Now, the raw block device throughput (dd if=/dev/zero
>> of=/path/to/ebs/mount bs=128k count=4096 oflag=direct)
>> 3 measurements on server machines dfs0[1-4]:
>>
>> dfs01: 9.0 MB/s, 16.4 MB/s, 18.4 MB/s
>> dfs02: 26.0 MB/s, 28.5 MB/s, 13.0 MB/s
>> dfs03: 14.4 MB/s, 11.8 MB/s, 32.6 MB/s
>> dfs04: 35.5 MB/s, 33.1 MB/s, 31.9 MB/s
>>
>> This, indeed, varies considerably!
>>
>> Thanks for help.
>> Karol
>>
>>
>> On Wed, Mar 23, 2011 at 7:06 PM, Vikas Gorur <vikas at gluster.com> wrote:
>>> Karol,
>>>
>>> A few general pointers about EBS performance:
>>>
>>> We've seen throughput to an EBS volume vary considerably. Since EBS is iSCSI underneath, throughput to a volume can fluctuate, and it is also possible that your instance is on degraded hardware that gets very low throughput to the volume.
>>>
>>> So I would advise you to first gather some data about all your EBS volumes. You can measure throughput to them by doing something like:
>>>
>>> dd if=/dev/zero of=/path/to/ebs/mount bs=128k count=4096 oflag=direct
>>>
>>> The "oflag=direct" will give us the raw block device throughput, without the kernel cache in the way.
>>>
>>> The performance you see on the Gluster mountpoint will be a function of the EBS performance. You might also want to spin up a couple more instances and see their EBS throughput to get an idea of the range of EBS performance.
>>>
>>> Doing a RAID0 of 4 or 8 EBS volumes using mdadm will also help you increase performance.
>>>
>>> ------------------------------
>>> Vikas Gorur
>>> Engineer - Gluster, Inc.
>>> ------------------------------
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>