Thanks again! But what I don't understand is that I have 3 X 2 servers so I would expect 20 X 3 = 60 MBPS total atleast. My load is getting spread accross 3 X 2 servers in distributed replica. If I was using just one gluster server I would understand but with 6 it makes no sense.. On Wed, Apr 20, 2011 at 11:47 AM, Joe Landman <landman at scalableinformatics.com> wrote: > On 04/20/2011 02:29 PM, Mohit Anchlia wrote: >> >> Please find >> >> >> [root at dsdb1 ~]# cat /proc/sys/vm/drop_caches >> 3 >> [root at dsdb1 ~]# dd if=/dev/zero of=/data/big.file bs=128k count=80k >> oflag=direct >> >> 81920+0 records in >> 81920+0 records out >> 10737418240 bytes (11 GB) copied, 521.553 seconds, 20.6 MB/s > > Suddenly this makes a great deal more sense. > >> [root at dsdb1 ~]# >> [root at dsdb1 ~]# dd if=/dev/zero of=/data/big.file bs=128k count=80k >> iflag=direct >> dd: opening `/dev/zero': Invalid argument >> [root at dsdb1 ~]# dd of=/dev/null if=/data/big.file bs=128k iflag=direct >> 81920+0 records in >> 81920+0 records out >> 10737418240 bytes (11 GB) copied, 37.854 seconds, 284 MB/s >> [root at dsdb1 ~]# > > About what I expected. > > Ok. ?Uncached OS writes get you to 20MB/s. ?Which is about what you are > seeing with the fuse mount and a dd. ?So I think we understand the write > side. > > The read side is about where I expected (lower actually, but not by enough > that I am concerned). > > You can try changing bs=2M count=6k on both to see the effect of larger > blocks. ?You should get some improvement. > > I think we need to dig into the details of that RAID0 construction now. > ?This might be something better done offlist (unless everyone wants to see > the gory details of digging into the hardware side). > > My current thought is that this is a hardware issue, and not a gluster issue > per se, but that there are possibilities for improving performance on the > gluster side of the equation. > > Short version: ?PERC is not fast (never has been), and it is often a bad > choice for high performance. ?You are often better off building an MD RAID > using the software tools in Linux, it will be faster. ?Think of PERC as an > HBA with some modicum of built in RAID capability. ?You don't really want to > use that capability if possible, but you do want to use the HBA. > > Longer version: ?Likely a striping issue, or a caching issue (need to see > battery state, cache size, etc.), not to mention the slow chip. ?Are the > disk write caches off or on (guessing off which is the right thing to do for > some workloads but it does impact performance). ?Also, the RAID cpu in PERC > (its a rebadged LSI) is very low performance in general, and specifically > not terribly good even at RAID0. ?These are direct writes, skipping OS > cache. ?They will let you see how fast the underlying hardware is, and if it > can handle the amount of data you want to shove onto disks. > > Here is my desktop: > > root at metal:/local2/home/landman# dd if=/dev/zero of=/local2/big.file bs=128k > count=80k oflag=direct > 81920+0 records in > 81920+0 records out > 10737418240 bytes (11 GB) copied, 64.7407 s, 166 MB/s > > root at metal:/local2/home/landman# dd if=/dev/zero of=/local2/big.file bs=2M > count=6k oflag=direct > 6144+0 records in > 6144+0 records out > 12884901888 bytes (13 GB) copied, 86.0184 s, 150 MB/s > > > > and a server in the lab > > [root at jr5-1 ~]# dd if=/dev/zero of=/data/big.file bs=128k count=80k > oflag=direct > 81920+0 records in > 81920+0 records out > 10737418240 bytes (11 GB) copied, 11.0948 seconds, 968 MB/s > > [root at jr5-1 ~]# dd if=/dev/zero of=/data/big.file bs=2M count=6k > oflag=direct > 6144+0 records in > 6144+0 records out > 12884901888 bytes (13 GB) copied, 5.11935 seconds, 2.5 GB/s > > > Gluster will not be faster than the bare metal (silicon). ?It may hide some > of the issues with caching. ?But it is bounded by how fast you can push to > or pull bits from the media. > > In an "optimal" config, the 4x SAS 10k RPM drives should be able to sustain > ~600 MB/s write. ?Reality will be less than this, guessing 250-400 MB/s in > most cases. ?This is still pretty low in performance. > > -- > Joseph Landman, Ph.D > Founder and CEO > Scalable Informatics Inc. > email: landman at scalableinformatics.com > web ?: http://scalableinformatics.com > ? ? ? http://scalableinformatics.com/sicluster > phone: +1 734 786 8423 x121 > fax ?: +1 866 888 3112 > cell : +1 734 612 4615 >