small write speed problem on EBS, distributed replica

karol.skocik at gmail.com (karol skocik) · Mon, 28 Mar 2011 09:14:39 +0200

Hi Mohit,
  as I handed it over the distributed FIO based test tool, a colleague
took over tuning these parameters. Seems like we can not base the
Gluster on such low performance EBS drives and are going to
investigate other options - ram drives on extra large instances. We
don't need terrabytes, just couple dozens gigabytes of storage.
Thanks for help - I guess this whole thread will save some time to
others wanting to experiment with Gluster on EBS.
Karol

On Sat, Mar 26, 2011 at 1:00 AM, Mohit Anchlia <mohitanchlia at gmail.com> wrote:
> I ran some dd test locally and see a big diferrence when using
> odirect. I have 10k rpm SAS drive and claim to have seek time of 3ms.
> So I don't understand this behaviour.
>
> dd if=/dev/zero of=/root/junk bs=128k count=8000 oflag=direct
> 8000+0 records in
> 8000+0 records out
> 1048576000 bytes (1.0 GB) copied, 58.8426 seconds, 17.8 MB/s
>
> dd if=/dev/zero of=/root/junk bs=128k count=8000
> 8000+0 records in
> 8000+0 records out
> 1048576000 bytes (1.0 GB) copied, 1.22749 seconds, 854 MB/s
>
> Can dev team look at my numbers with given config and also Karols
> data? I expect much higher rate.
>
> gluster seriously lacks in documents and completely leaves everyone in
> confused state :) Not sure how to deal with that since there is no
> commercial support unless you use vmware 4.1. So I guess if it doesn't
> work then look for some other technology. But all the claims I see
> about performance makes me feel if we had little better info on
> performance tuning would help tremenduously.
>
> On Fri, Mar 25, 2011 at 2:16 PM, Mohit Anchlia <mohitanchlia at gmail.com> wrote:
>> It will be good for dev team to look at it in parallel. It will help others too.
>>
>> First thing that I see is that your network bandwidth sucks. Is it
>> 1GigE? When you run tools like iperf you atleast expect to see close
>> to 800MB/s. for eg: in my env if I run iperf I get something like:
>>
>> ------------------------------------------------------------
>> TCP window size: 16.0 KByte (default)
>> ------------------------------------------------------------
>> [ ?6] local 10.1.101.193 port 49503 connected with 10.1.101.149 port 5001
>> [ ?4] ?0.0-10.0 sec ? ?975 MBytes ? ?815 Mbits/sec
>> [ ?5] local 10.1.101.193 port 5001 connected with 10.1.101.149 port 41642
>>
>> Can you also try another dd test directly on the gluster server where
>> volume is and post the results?
>>
>> Regarding other perf related questions I haven't myself tries those
>> yet so I think you will need to change one at a time and expirement
>> with it. But if there is a inherent perf problem with the server and
>> underlying storage then those may not be that helpful.
>>
>> On Thu, Mar 24, 2011 at 3:55 AM, karol skocik <karol.skocik at gmail.com> wrote:
>>> Hi Vikas, Mohit,
>>> ?I should disclose our typical use cases:
>>> We need to read and write files of size several 100s of MBs - the
>>> ratio of read : write is about 1:1.
>>>
>>>> What did you use to calculate latency?
>>>
>>> I used http://www.bitmover.com/lmbench they have a tool "lat_tcp".
>>>
>>> Numbers below are from lmbench tool "bw_tcp":
>>>
>>>> Network bandwidths:
>>>> dfs01: 54 MB/s
>>>> dfs02: 62.5 MB/s
>>>> dfs03: 64 MB/s
>>>> dfs04: 91.5 MB/s
>>>
>>> The setup is Gluster native, no NFS.
>>>
>>> About the "Optimizing Gluster" link - I have seen it before, but there
>>> are several things I don't understand:
>>>
>>> 1.) Tuning FUSE to use larger blocksize - when testing PVFS, we
>>> achieved best performance with bs = 4MB.
>>> It's hard to understand why it's hardcoded to 128 KB.
>>> Also I have read somewhere else (referencing FUSE) - that larger
>>> blocksize doesn't yield more performance.
>>> I guess when transfering larger amount of data on network with
>>> significant latency,
>>> a lot less IO requests should result in higher throughput. (And it's
>>> cheaper also on EBS).
>>>
>>> Are those listed adjustments to FUSE kernel modules still applicable?
>>>
>>> 2.) Enabling direct-io mode
>>> Does this work on current 3.1.2? :
>>>
>>> glusterfs --direct-io-mode=write-only -f <spec-file> <mount-point>
>>>
>>> also with --direct-io-mode=read-write ?
>>>
>>> Of those parameters in "Setting Volume Options", could this one help:
>>> - performance.write-behind-window-size - increasing 10-20 times?
>>>
>>> Now, the raw block device throughput (dd if=/dev/zero
>>> of=/path/to/ebs/mount bs=128k count=4096 oflag=direct)
>>> 3 measurements on server machines dfs0[1-4]:
>>>
>>> dfs01: 9.0 MB/s, 16.4 MB/s, 18.4 MB/s
>>> dfs02: 26.0 MB/s, 28.5 MB/s, 13.0 MB/s
>>> dfs03: 14.4 MB/s, 11.8 MB/s, 32.6 MB/s
>>> dfs04: 35.5 MB/s, 33.1 MB/s, 31.9 MB/s
>>>
>>> This, indeed, varies considerably!
>>>
>>> Thanks for help.
>>> Karol
>>>
>>>
>>> On Wed, Mar 23, 2011 at 7:06 PM, Vikas Gorur <vikas at gluster.com> wrote:
>>>> Karol,
>>>>
>>>> A few general pointers about EBS performance:
>>>>
>>>> We've seen throughput to an EBS volume vary considerably. Since EBS is iSCSI underneath, throughput to a volume can fluctuate, and it is also possible that your instance is on degraded hardware that gets very low throughput to the volume.
>>>>
>>>> So I would advise you to first gather some data about all your EBS volumes. You can measure throughput to them by doing something like:
>>>>
>>>> dd if=/dev/zero of=/path/to/ebs/mount bs=128k count=4096 oflag=direct
>>>>
>>>> The "oflag=direct" will give us the raw block device throughput, without the kernel cache in the way.
>>>>
>>>> The performance you see on the Gluster mountpoint will be a function of the EBS performance. You might also want to spin up a couple more instances and see their EBS throughput to get an idea of the range of EBS performance.
>>>>
>>>> Doing a RAID0 of 4 or 8 EBS volumes using mdadm will also help you increase performance.
>>>>
>>>> ------------------------------
>>>> Vikas Gorur
>>>> Engineer - Gluster, Inc.
>>>> ------------------------------
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>