Re: Extremely low performance - am I doing somethingwrong?

wkmail <wkmail@xxxxxxxxx> · Fri, 5 Jul 2019 12:28:48 -0700

well if you are addressing me, that was the point of my post re the 
original posters complaint.

If his chosen test gets lousy or inconsistent results on non-gluster 
setups then its hard to complain about gluster absent the known Gluster 
issues (i.e. network bandwidth, fuse context switching, etc)

there is more involved there.

and yes, my performance IS better inside the VMs because even though you 
use the oflag for sync or direct, KVM/Qemu still caches stuff underneath 
the qcow2 image.

So this is hist test on an active Gluster Rep2 + arb KVM setup run 
within a qcow2 image that is doing real work.

# for i in {1..5}; do { dd if=/dev/zero of=./test.tmp bs=1M count=10 
oflag=sync; rm -f ./test.tmp; } done
10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.0206228 s, 508 MB/s
10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.0152477 s, 688 MB/s
10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.0149008 s, 704 MB/s
10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.014808 s, 708 MB/s
10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.0147982 s, 709 MB/s

On 7/5/2019 12:13 PM, Strahil wrote:
I don't know what you are trying to test, but I'm sure this test doesn't show anything meaningful.
Have you tested with your apps' workload ?

I have done your test and I get aprox 20MB/s, but I can asure you that the performance is way better in my VMs.

Best Regards,
Strahil NikolovOn Jul 5, 2019 20:17, wkmail <wkmail@xxxxxxxxx> wrote:

On 7/4/2019 2:28 AM, Vladimir Melnik wrote:
So, the disk is OK and the network is OK, I'm 100% sure.

Seems to be a GlusterFS-related issue. Either something needs to be
tweaked or it's a normal performance for a replica-3 cluster.
There is more to it than Gluster on that particular test.

I have some addititional datapoints, since those numbers seemed low
given the long time I have played with Gluster (first install was 3.3)

So I ran that exact test on some locally mounted hard drive sets (mdadm
RAID1- spinning metal) on Cent7 (stock)  and U18(stock) and got the
following:

No Gluster involved.

# for i in {1..5}; do { dd if=/dev/zero of=./test.tmp bs=1M count=10
oflag=sync; rm -f ./test.tmp; } done
10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 1.0144 s, 10.3 MB/s
10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 0.791071 s, 13.3 MB/s
10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 0.832186 s, 12.6 MB/s
10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 0.80427 s, 13.0 MB/s
10+0 records in

That was reproducible over several machines with different CPUs that we
have in production.

Performance is about 20%  better when 7200 rpm drives were involved or
when no RAID was involved but never above 18 MB/s

Performance is also MUCH better when I use oflag=direct (roughly 2x)

However, on a U18 VM Host testbed machine that has a seperate SSD swap
disk I get the following, even though I am writing the test.tmp file to
the metal.

# for i in {1..5}; do { dd if=/dev/zero of=./test.tmp bs=1M count=10
oflag=sync; rm -f ./test.tmp; } done

10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.0949153 s, 110 MB/s
10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.0605883 s, 173 MB/s
10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.0582863 s, 180 MB/s
10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.0604369 s, 173 MB/s
10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.0598746 s, 175 MB/s

So something else is going on with that particular test. Clearly,
buffers, elevators, cache etc count despite the oflag setting.

For the record on the Gluster Fuse Mount (2x+1arb Volume)  on that VM
Host I do get reduced performance

part of that is due to the gluster network being 2x1G using teaming on
that testbed, so there is a network bottleneck.

# for i in {1..5}; do { dd if=/dev/zero of=./test.tmp bs=1M count=10
oflag=sync; rm -f ./test.tmp; } done
10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.693351 s, 15.1 MB/s
10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.349881 s, 30.0 MB/s
10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.339699 s, 30.9 MB/s
10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.34202 s, 30.7 MB/s
10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.337904 s, 31.0 MB/s

So the gluster fuse mount negates the advantage of that SSD swap disk
along with the obvious network bottleneck.

But clearly we have to all agree on same valid test.

-wk

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users