Horrible performance with small files (DHT/AFR)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The current boxes I'm using for testing are as follows:

  * 2x dual-core Opteron ~2GHz (x86_64)
  * 4GB RAM
  * 4x 7200 RPM 73GB SATA - RAID1+0 w/3ware hardware controllers

The server storage directories live in /home/clusterfs where /home is  
an ext3 partition mounted with noatime.

These servers are not virtualized.  They are running Ubuntu 8.04 LTS  
Server x86_64.

The files I'm copying are all <2k javascript files (plain text) stored  
in 100 hash directories in each of 3 parent directories:

/home/clusterfs/
   + parentdir1/
   |   + 00/
   |   | ...
   |   + 99/
   + parentdir1/
   |   + 00/
   |   | ...
   |   + 99/
   + parentdir1/
       + 00/
       | ...
       + 99/

There are ~10k of these <2k javascript files distributed throughout  
the above directory structure totaling approximately 570MB.  My tests  
have been copying that entire directory structure from a client  
machine into the glusterfs mountpoint on the client.

Observing IO on both the client box & all the server boxes via iostat  
shows that the disks are doing *very* little work.  Observing the CPU/ 
memory load with top or htop shows that none of the boxes are CPU or  
memory bound.  Observing the bandwidth in/out of the network interface  
shows <1MB/s throughput (we have a fully gigabit LAN!) which usually  
drops down to <150KB/s during the copy.

scp'ing the same directory structure from the same client to one of  
the same servers will work at ~40-50MB/s sustained as a comparison.   
Here is the results of copying the same directory structure using  
rsync to the same partition:

# time rsync -ap * benk at cfs1:~/cache/
benk at cfs1's password:

real	0m23.566s
user	0m8.433s
sys	0m4.580s

Ben

On Jun 3, 2009, at 3:16 PM, Jasper van Wanrooy - Chatventure wrote:

> Hi Benjamin,
>
> That's not good news. What kind of hardware do you use? Is it  
> virtualised? Or do you use real boxes?
> What kind of files are you copying in your test? What performance do  
> you have when copying it to a local dir?
>
> Best regards Jasper
>
> ----- Original Message -----
> From: "Benjamin Krein" <superbenk at superk.org>
> To: "Jasper van Wanrooy - Chatventure" <jvanwanrooy at chatventure.nl>
> Cc: "Vijay Bellur" <vijay at gluster.com>, gluster-users at gluster.org
> Sent: Wednesday, 3 June, 2009 19:23:51 GMT +01:00 Amsterdam /  
> Berlin / Bern / Rome / Stockholm / Vienna
> Subject: Re: Horrible performance with small files  
> (DHT/AFR)
>
> I reduced my config to only 2 servers (had to donate 2 of the 4 to
> another project).  I now have a single server using DHT (for future
> scaling) and AFR to a mirrored server.  Copy times are much better,
> but still pretty horrible:
>
> # time cp -rp * /mnt/
>
> real	21m11.505s
> user	0m1.000s
> sys	0m6.416s
>
> Ben
>
> On Jun 3, 2009, at 3:13 AM, Jasper van Wanrooy - Chatventure wrote:
>
>> Hi Benjamin,
>>
>> Did you also try with a lower thread-count. Actually I'm using 3
>> threads.
>>
>> Best Regards Jasper
>>
>>
>> On 2 jun 2009, at 18:25, Benjamin Krein wrote:
>>
>>> I do not see any difference with autoscaling removed.  Current
>>> server config:
>>>
>>> # webform flat-file cache
>>>
>>> volume webform_cache
>>> type storage/posix
>>> option directory /home/clusterfs/webform/cache
>>> end-volume
>>>
>>> volume webform_cache_locks
>>> type features/locks
>>> subvolumes webform_cache
>>> end-volume
>>>
>>> volume webform_cache_brick
>>> type performance/io-threads
>>> option thread-count 32
>>> subvolumes webform_cache_locks
>>> end-volume
>>>
>>> <<snip>>
>>>
>>> # GlusterFS Server
>>> volume server
>>> type protocol/server
>>> option transport-type tcp
>>> subvolumes dns_public_brick dns_private_brick webform_usage_brick
>>> webform_cache_brick wordpress_uploads_brick subs_exports_brick
>>> option auth.addr.dns_public_brick.allow 10.1.1.*
>>> option auth.addr.dns_private_brick.allow 10.1.1.*
>>> option auth.addr.webform_usage_brick.allow 10.1.1.*
>>> option auth.addr.webform_cache_brick.allow 10.1.1.*
>>> option auth.addr.wordpress_uploads_brick.allow 10.1.1.*
>>> option auth.addr.subs_exports_brick.allow 10.1.1.*
>>> end-volume
>>>
>>> # time cp -rp * /mnt/
>>>
>>> real	70m13.672s
>>> user	0m1.168s
>>> sys	0m8.377s
>>>
>>> NOTE: the above test was also done during peak hours when the LAN/
>>> dev server were in use which would cause some of the extra time.
>>> This is still WAY too much, though.
>>>
>>> Ben
>>>
>>>
>>> On Jun 1, 2009, at 1:40 PM, Vijay Bellur wrote:
>>>
>>>> Hi Benjamin,
>>>>
>>>> Could you please try by turning autoscaling off?
>>>>
>>>> Thanks,
>>>> Vijay
>>>>
>>>> Benjamin Krein wrote:
>>>>> I'm seeing extremely poor performance writing small files to a
>>>>> glusterfs DHT/AFR mount point. Here are the stats I'm seeing:
>>>>>
>>>>> * Number of files:
>>>>> root at dev1|/home/aweber/cache|# find |wc -l
>>>>> 102440
>>>>>
>>>>> * Average file size (bytes):
>>>>> root at dev1|/home/aweber/cache|# ls -lR | awk '{sum += $5; n++;}
>>>>> END {print sum/n;}'
>>>>> 4776.47
>>>>>
>>>>> * Using scp:
>>>>> root at dev1|/home/aweber/cache|# time scp -rp * benk at cfs1:~/cache/
>>>>>
>>>>> real 1m38.726s
>>>>> user 0m12.173s
>>>>> sys 0m12.141s
>>>>>
>>>>> * Using cp to glusterfs mount point:
>>>>> root at dev1|/home/aweber/cache|# time cp -rp * /mnt
>>>>>
>>>>> real 30m59.101s
>>>>> user 0m1.296s
>>>>> sys 0m5.820s
>>>>>
>>>>> Here is my configuration (currently, single client writing to 4
>>>>> servers (2 DHT servers doing AFR):
>>>>>
>>>>> SERVER:
>>>>>
>>>>> # webform flat-file cache
>>>>>
>>>>> volume webform_cache
>>>>> type storage/posix
>>>>> option directory /home/clusterfs/webform/cache
>>>>> end-volume
>>>>>
>>>>> volume webform_cache_locks
>>>>> type features/locks
>>>>> subvolumes webform_cache
>>>>> end-volume
>>>>>
>>>>> volume webform_cache_brick
>>>>> type performance/io-threads
>>>>> option thread-count 32
>>>>> option max-threads 128
>>>>> option autoscaling on
>>>>> subvolumes webform_cache_locks
>>>>> end-volume
>>>>>
>>>>> <<snip>>
>>>>>
>>>>> # GlusterFS Server
>>>>> volume server
>>>>> type protocol/server
>>>>> option transport-type tcp
>>>>> subvolumes dns_public_brick dns_private_brick webform_usage_brick
>>>>> webform_cache_brick wordpress_uploads_brick subs_exports_brick
>>>>> option auth.addr.dns_public_brick.allow 10.1.1.*
>>>>> option auth.addr.dns_private_brick.allow 10.1.1.*
>>>>> option auth.addr.webform_usage_brick.allow 10.1.1.*
>>>>> option auth.addr.webform_cache_brick.allow 10.1.1.*
>>>>> option auth.addr.wordpress_uploads_brick.allow 10.1.1.*
>>>>> option auth.addr.subs_exports_brick.allow 10.1.1.*
>>>>> end-volume
>>>>>
>>>>> CLIENT:
>>>>>
>>>>> # Webform Flat-File Cache Volume client configuration
>>>>>
>>>>> volume srv1
>>>>> type protocol/client
>>>>> option transport-type tcp
>>>>> option remote-host cfs1
>>>>> option remote-subvolume webform_cache_brick
>>>>> end-volume
>>>>>
>>>>> volume srv2
>>>>> type protocol/client
>>>>> option transport-type tcp
>>>>> option remote-host cfs2
>>>>> option remote-subvolume webform_cache_brick
>>>>> end-volume
>>>>>
>>>>> volume srv3
>>>>> type protocol/client
>>>>> option transport-type tcp
>>>>> option remote-host cfs3
>>>>> option remote-subvolume webform_cache_brick
>>>>> end-volume
>>>>>
>>>>> volume srv4
>>>>> type protocol/client
>>>>> option transport-type tcp
>>>>> option remote-host cfs4
>>>>> option remote-subvolume webform_cache_brick
>>>>> end-volume
>>>>>
>>>>> volume afr1
>>>>> type cluster/afr
>>>>> subvolumes srv1 srv3
>>>>> end-volume
>>>>>
>>>>> volume afr2
>>>>> type cluster/afr
>>>>> subvolumes srv2 srv4
>>>>> end-volume
>>>>>
>>>>> volume dist
>>>>> type cluster/distribute
>>>>> subvolumes afr1 afr2
>>>>> end-volume
>>>>>
>>>>> volume writebehind
>>>>> type performance/write-behind
>>>>> option cache-size 4mb
>>>>> option flush-behind on
>>>>> subvolumes dist
>>>>> end-volume
>>>>>
>>>>> volume cache
>>>>> type performance/io-cache
>>>>> option cache-size 512mb
>>>>> subvolumes writebehind
>>>>> end-volume
>>>>>
>>>>> Benjamin Krein
>>>>> www.superk.org
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>>
>




[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux