small write speed problem on EBS, distributed replica

mohitanchlia at gmail.com (Mohit Anchlia) · Wed, 23 Mar 2011 09:56:38 -0700

What were you really expecting the numbers to be? What no. do you get
when you write directly to the ext3 file system bypassing GFS?

I also suggest calculating network latency.

On Wed, Mar 23, 2011 at 4:17 AM, karol skocik <karol.skocik at gmail.com> wrote:
> I see my email to the list was truncated - sending it again.
>
> Hi,
> ?here are the measurements - the client machine is KS, and server
> machines are DFS0[1-4].
> First, the setup now is:
>
> Volume Name: EBSOne
> Type: Distribute
> Status: Started
> Number of Bricks: 1
> Transport-type: tcp
> Bricks:
> Brick1: dfs01:/mnt/ebs
>
> With just one client machine writing 1GB file to EBSOne, averaged from 3 runs:
>
> Bandwidth (mean): 22441.84 KB/s
> Bandwidth (deviation): 6059.24 KB/s
> Completion latency (mean): 1274.47 KB/s
> Completion latency (deviation): 1814.58 KB/s
>
> Now, the latencies:
>
> From KS (client machine) to DFS (server machines), averages of 3 runs.
>
> Latencies:
> dfs01: 402 microseconds
> dfs02: 322 microseconds
> dfs03: 445 microseconds
> dfs04: 378 microseconds
>
> Bandwidths:
> dfs01: 54 MB/s
> dfs02: 62.5 MB/s
> dfs03: 64 MB/s
> dfs04: 91.5 MB/s
>
> Every server machine has just 1 EBS drive, ext3 filesystem,
> 2.6.18-xenU-ec2-v1.0 - CFQ IO scheduler.
>
> Any ideas? From the numbers above - does it have any sense to try to
> make sw RAID0 with mdadm, or eventually use another filesystem?
>
> Thank you for help.
> Regards Karol
>
> On Wed, Mar 23, 2011 at 11:31 AM, karol skocik <karol.skocik at gmail.com> wrote:
>> Hi,
>> ?here are the measurements - the client machine is KS, and server
>> machines are DFS0[1-4].
>> First, the setup now is:
>>
>> Volume Name: EBSOne
>> Type: Distribute
>> Status: Started
>> Number of Bricks: 1
>> Transport-type: tcp
>> Bricks:
>> Brick1: dfs01:/mnt/ebs
>>
>> With just one client machine writing 1GB file to EBSOne, averaged from 3 runs:
>>
>> Bandwidth (mean): 22441.84 KB/s
>> Bandwidth (deviation): 6059.24 KB/s
>> Completion latency (mean): 1274.47 KB/s
>> Completion latency (deviation): 1814.58 KB/s
>>
>> Now, the latencies:
>>
>> From KS (client machine) to DFS (server machines), averages of 3 runs.
>>
>> Latencies:
>> dfs01: 402 microseconds
>> dfs02: 322 microseconds
>> dfs03: 445 microseconds
>> dfs04: 378 microseconds
>>
>> Bandwidths:
>> dfs01: 54 MB/s
>> dfs02: 62.5 MB/s
>> dfs03: 64 MB/s
>> dfs04: 91.5 MB/s
>>
>> Every server machine has just 1 EBS drive, ext3 filesystem,
>> 2.6.18-xenU-ec2-v1.0 - CFQ IO scheduler.
>>
>> Any ideas? From the numbers above - does it have any sense to try to
>> make sw RAID0 with mdadm, or eventually use another filesystem?
>>
>> Thank you for help.
>> Regards Karol
>>
>> On Tue, Mar 22, 2011 at 6:08 PM, Mohit Anchlia <mohitanchlia at gmail.com> wrote:
>>> Can you first run some test with no replica and see what results you
>>> get? Also, can you look at network latency from client to each of your
>>> 4 servers and post the results?
>>>
>>> On Mon, Mar 21, 2011 at 1:27 AM, karol skocik <karol.skocik at gmail.com> wrote:
>>>> Hi,
>>>> ?I am in the process of evaluation of Gluster for major BI company,
>>>> but I was surprised by very small write performance on Amazon EBS.
>>>> Our setup is Gluster 3.1.2, distributed replica 2x2 on 64-bit m1.large
>>>> instances. Every server node has 1 EBS volume attached to it.
>>>> The configuration of the distributed replica is a default one, my
>>>> small attemps to improve performance (io-threads, disabled io-stats
>>>> and latency-measurement):
>>>>
>>>> volume EBSVolume-posix
>>>> ? ?type storage/posix
>>>> ? ?option directory /mnt/ebs
>>>> end-volume
>>>>
>>>> volume EBSVolume-access-control
>>>> ? ?type features/access-control
>>>> ? ?subvolumes EBSVolume-posix
>>>> end-volume
>>>>
>>>> volume EBSVolume-locks
>>>> ? ?type features/locks
>>>> ? ?subvolumes EBSVolume-access-control
>>>> end-volume
>>>>
>>>> volume EBSVolume-io-threads
>>>> ? ?type performance/io-threads
>>>> ? ?option thread-count 4
>>>> ? ?subvolumes EBSVolume-locks
>>>> end-volume
>>>>
>>>> volume /mnt/ebs
>>>> ? ?type debug/io-stats
>>>> ? ?option log-level NONE
>>>> ? ?option latency-measurement off
>>>> ? ?subvolumes EBSVolume-io-threads
>>>> end-volume
>>>>
>>>> volume EBSVolume-server
>>>> ? ?type protocol/server
>>>> ? ?option transport-type tcp
>>>> ? ?option auth.addr./mnt/ebs.allow *
>>>> ? ?subvolumes /mnt/ebs
>>>> end-volume
>>>>
>>>> In our test, all clients starts writing to different 1GB file at the same time.
>>>> The measured write bandwidth, with 2x2 servers:
>>>>
>>>> 1 client: 6.5 MB/s
>>>> 2 clients: 4.1 MB/s
>>>> 3 clients: 2.4 MB/s
>>>> 4 clients: 4.3 MB/s
>>>>
>>>> This is not acceptable for our needs. With PVFS2 (I know it's
>>>> stripping which is very different from replica) we can get up to 35
>>>> MB/s.
>>>> 2-3 times slower than that would be understandable. But 5-15 times
>>>> slower is not, and I would like to know whether there is something we
>>>> could try out.
>>>>
>>>> Could anybody publish their write speeds on similar setup, and tips
>>>> how to achieve better performance?
>>>>
>>>> Thank you,
>>>> ?Karol
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>>
>>>
>>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>