Performance

landman at scalableinformatics.com (Joe Landman) · Wed, 20 Apr 2011 14:47:06 -0400

On 04/20/2011 02:29 PM, Mohit Anchlia wrote:
> Please find
>
>
> [root at dsdb1 ~]# cat /proc/sys/vm/drop_caches
> 3
> [root at dsdb1 ~]# dd if=/dev/zero of=/data/big.file bs=128k count=80k oflag=direct
>
> 81920+0 records in
> 81920+0 records out
> 10737418240 bytes (11 GB) copied, 521.553 seconds, 20.6 MB/s

Suddenly this makes a great deal more sense.

> [root at dsdb1 ~]#
> [root at dsdb1 ~]# dd if=/dev/zero of=/data/big.file bs=128k count=80k iflag=direct
> dd: opening `/dev/zero': Invalid argument
> [root at dsdb1 ~]# dd of=/dev/null if=/data/big.file bs=128k iflag=direct
> 81920+0 records in
> 81920+0 records out
> 10737418240 bytes (11 GB) copied, 37.854 seconds, 284 MB/s
> [root at dsdb1 ~]#

About what I expected.

Ok.  Uncached OS writes get you to 20MB/s.  Which is about what you are 
seeing with the fuse mount and a dd.  So I think we understand the write 
side.

The read side is about where I expected (lower actually, but not by 
enough that I am concerned).

You can try changing bs=2M count=6k on both to see the effect of larger 
blocks.  You should get some improvement.

I think we need to dig into the details of that RAID0 construction now. 
  This might be something better done offlist (unless everyone wants to 
see the gory details of digging into the hardware side).

My current thought is that this is a hardware issue, and not a gluster 
issue per se, but that there are possibilities for improving performance 
on the gluster side of the equation.

Short version:  PERC is not fast (never has been), and it is often a bad 
choice for high performance.  You are often better off building an MD 
RAID using the software tools in Linux, it will be faster.  Think of 
PERC as an HBA with some modicum of built in RAID capability.  You don't 
really want to use that capability if possible, but you do want to use 
the HBA.

Longer version:  Likely a striping issue, or a caching issue (need to 
see battery state, cache size, etc.), not to mention the slow chip.  Are 
the disk write caches off or on (guessing off which is the right thing 
to do for some workloads but it does impact performance).  Also, the 
RAID cpu in PERC (its a rebadged LSI) is very low performance in 
general, and specifically not terribly good even at RAID0.  These are 
direct writes, skipping OS cache.  They will let you see how fast the 
underlying hardware is, and if it can handle the amount of data you want 
to shove onto disks.

Here is my desktop:

root at metal:/local2/home/landman# dd if=/dev/zero of=/local2/big.file 
bs=128k count=80k oflag=direct
81920+0 records in
81920+0 records out
10737418240 bytes (11 GB) copied, 64.7407 s, 166 MB/s

root at metal:/local2/home/landman# dd if=/dev/zero of=/local2/big.file 
bs=2M count=6k oflag=direct
6144+0 records in
6144+0 records out
12884901888 bytes (13 GB) copied, 86.0184 s, 150 MB/s

and a server in the lab

[root at jr5-1 ~]# dd if=/dev/zero of=/data/big.file bs=128k count=80k 
oflag=direct
81920+0 records in
81920+0 records out
10737418240 bytes (11 GB) copied, 11.0948 seconds, 968 MB/s

[root at jr5-1 ~]# dd if=/dev/zero of=/data/big.file bs=2M count=6k 
oflag=direct
6144+0 records in
6144+0 records out
12884901888 bytes (13 GB) copied, 5.11935 seconds, 2.5 GB/s

Gluster will not be faster than the bare metal (silicon).  It may hide 
some of the issues with caching.  But it is bounded by how fast you can 
push to or pull bits from the media.

In an "optimal" config, the 4x SAS 10k RPM drives should be able to 
sustain ~600 MB/s write.  Reality will be less than this, guessing 
250-400 MB/s in most cases.  This is still pretty low in performance.

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615