Re: Glusterfs performance tweaks

marianna cattani <marianna.cattani@xxxxxxxxx> · Sat, 11 Apr 2015 18:39:14 +0200

I have a similar behaviour.

All systems are identical

This is my setup:

############################################################
root@store-1:~# uname -a
Linux store-1 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt7-1 (2015-03-01) x86_64 GNU/Linux
root@store-1:~# glusterfsd --version
glusterfs 3.6.2 built on Jan 21 2015 14:23:41
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2013 Red Hat, Inc. <http://www.redhat.com/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.
root@store-1:~# gluster peer status
Number of Peers: 3

Hostname: 172.16.155.23
Uuid: 094c8ec4-f472-4d32-b3cb-705458cca949
State: Peer in Cluster (Connected)

Hostname: 172.16.155.22
Uuid: 00042351-cc66-4528-ac66-75ed7ea01b44
State: Peer in Cluster (Connected)

Hostname: 172.16.155.24
Uuid: 113307a5-79c8-47c3-a6fa-8b725a56f807
State: Peer in Cluster (Connected)
root@store-1:~# gluster volume info

Volume Name: cinder-disperse
Type: Disperse
Volume ID: a70cf0c4-9320-4e7a-8e6b-a9b6242d151e
Status: Started
Number of Bricks: 1 x (3 + 1) = 4
Transport-type: tcp
Bricks:
Brick1: 172.16.155.21:/data/cidi
Brick2: 172.16.155.22:/data/cidi
Brick3: 172.16.155.23:/data/cidi
Brick4: 172.16.155.24:/data/cidi
Options Reconfigured:
storage.owner-gid: 114
storage.owner-uid: 108
performance.cache-size: 24576MB
performance.io-thread-count: 64
cluster.data-self-heal-algorithm: full
cluster.server-quorum-type: none
cluster.quorum-type: none
network.remote-dio: disable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
server.allow-insecure: on
root@store-1:~# iperf -c 172.16.155.22
------------------------------------------------------------
Client connecting to 172.16.155.22, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 172.16.155.21 port 46843 connected with 172.16.155.22 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  7.92 GBytes  6.80 Gbits/sec
root@store-1:~# iperf -c 172.16.155.23
------------------------------------------------------------
Client connecting to 172.16.155.23, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 172.16.155.21 port 59368 connected with 172.16.155.23 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  7.94 GBytes  6.82 Gbits/sec
root@store-1:~# iperf -c 172.16.155.24
------------------------------------------------------------
Client connecting to 172.16.155.24, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 172.16.155.21 port 33656 connected with 172.16.155.24 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  7.94 GBytes  6.82 Gbits/sec
root@store-1:~# 
root@store-1:~# time sh -c "dd if=/dev/zero of=/data/cidi/test.tmp bs=4k count=2000000 && sync"
2000000+0 record dentro
2000000+0 record fuori
8192000000 byte (8,2 GB) copiati, 8,58092 s, 955 MB/s

real	0m20.749s
user	0m0.276s
sys	0m6.304s
root@store-1:/data# 

############################################################

root@store-2:~#  time sh -c "dd if=/dev/zero of=/data/cidi/test.tmp bs=4k count=2000000 && sync"
2000000+0 record dentro
2000000+0 record fuori
8192000000 byte (8,2 GB) copiati, 9,60108 s, 853 MB/s

real	0m20.751s
user	0m0.264s
sys	0m6.128s

############################################################

root@store-3:~#  time sh -c "dd if=/dev/zero of=/data/cidi/test.tmp bs=4k count=2000000 && sync"
2000000+0 record dentro
2000000+0 record fuori
8192000000 byte (8,2 GB) copiati, 9,27243 s, 883 MB/s

real	0m20.979s
user	0m0.284s
sys	0m6.168s

############################################################

root@store-4:~#  time sh -c "dd if=/dev/zero of=/data/cidi/test.tmp bs=4k count=2000000 && sync"
2000000+0 record dentro
2000000+0 record fuori
8192000000 byte (8,2 GB) copiati, 9,12428 s, 898 MB/s

real	0m20.067s
user	0m0.244s
sys	0m6.304s

############################################################

I try to mount from another node

############################################################

root@nodo-3:~# iperf -c 172.16.155.21
------------------------------------------------------------
Client connecting to 172.16.155.21, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 172.16.155.13 port 59299 connected with 172.16.155.21 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  7.08 GBytes  6.08 Gbits/sec
root@nodo-3:~# mount -t glusterfs 172.16.155.21:/cinder-disperse /mnt/cinder-disperse
root@nodo-3:~# time sh -c "dd if=/dev/zero of=/mnt/cinder-disperse/test.tmp bs=4k count=2000000 && sync"
2000000+0 record dentro
2000000+0 record fuori
8192000000 byte (8,2 GB) copiati, 72,6074 s, 113 MB/s

real	1m13.842s
user	0m0.584s
sys	0m19.868s
root@nodo-3:~# 

############################################################

Is this the write speed that I can expect ?
Can I improve it in some way ?

Thanks a lot. 

M.

2015-04-11 6:19 GMT+02:00 Prasun Gera <prasun.gera@xxxxxxxxx>:
There is something that's not clear in what you are describing. Gluster doesn't come into play until you access your data through the gulsterfs mount. You can even stop your gluster volume and stop the glusterfs daemon to confirm that it is not really interfering with your writes to the brick in any way. What you are describing sounds like an issue with the way you have partitioned your drive or set up the filesystem, which is probably xfs in case of glusterfs if you are using defaults. Are you comparing the same file system in both your cases ? 

On Fri, Apr 10, 2015 at 11:45 AM, Punit Dambiwal <hypunit@xxxxxxxxx> wrote:
Hi Ben,
That means if i will not attach the SSD in to brick...even not install glusterfs on the server...it gives me throughput about 300mb/s but once i will install glusterfs and add this ssd in to glusterfs volume it gives me 16 mb/s...

On Fri, Apr 10, 2015 at 9:32 PM, Ben Turner <bturner@xxxxxxxxxx> wrote:
----- Original Message -----

> From: "Punit Dambiwal" <hypunit@xxxxxxxxx>

> To: "Ben Turner" <bturner@xxxxxxxxxx>

> Cc: "Vijay Bellur" <vbellur@xxxxxxxxxx>, gluster-users@xxxxxxxxxxx

> Sent: Thursday, April 9, 2015 9:36:59 PM

> Subject: Re:  Glusterfs performance tweaks

>

> Hi Ben,

>

> But without glusterfs if i run the same command with dsync on the same

> ssd...it gives me good throughput...all setup (CPU,RAM,Network are same)

> the only difference is no glusterfs...

>

> [root@cpu09 mnt]# dd if=/dev/zero of=test bs=64k count=4k oflag=dsync

> 4096+0 records in

> 4096+0 records out

> 268435456 bytes (268 MB) copied, 0.935646 s, 287 MB/s

> [root@cpu09 mnt]#

>

> [image: Inline image 1]

>

> But on the top of the glusterfs it gives too slow performance....i run the

> ssd trim every night to clean the garbage collection...i think there is

> something need to do from gluster or OS side to improve the

> performance....otherwise no use to use the ALL SSD with gluster because

> with all SSD you will get the performance slower then SATA....

>

>

>

> On Fri, Apr 10, 2015 at 2:12 AM, Ben Turner <bturner@xxxxxxxxxx> wrote:

>

> > ----- Original Message -----

> > > From: "Punit Dambiwal" <hypunit@xxxxxxxxx>

> > > To: "Vijay Bellur" <vbellur@xxxxxxxxxx>

> > > Cc: gluster-users@xxxxxxxxxxx

> > > Sent: Wednesday, April 8, 2015 9:55:38 PM

> > > Subject: Re:  Glusterfs performance tweaks

> > >

> > > Hi Vijay,

> > >

> > > If i run the same command directly on the brick...

What does this mean then?  Running directly on the brick to me means running directly on the SSD.  The command below is the same thing as above, what changed?

-b

> > >

> > > [root@cpu01 1]# dd if=/dev/zero of=test bs=64k count=4k oflag=dsync

> > > 4096+0 records in

> > > 4096+0 records out

> > > 268435456 bytes (268 MB) copied, 16.8022 s, 16.0 MB/s

> > > [root@cpu01 1]# pwd

> > > /bricks/1

> > > [root@cpu01 1]#

> > >

> >

> > This is your problem.  Gluster is only as fast as its slowest piece, and

> > here your storage is the bottleneck.  Being that you get 16 MB to the brick

> > and 12 to gluster that works out to about 25% overhead which is what I

> > would expect with a single thread, single brick, single client scenario.

> > This may have something to do with the way SSDs write?  On my SSD at my

> > desk I only get 11.4 MB / sec when I run that DD command:

> >

> > # dd if=/dev/zero of=test bs=64k count=4k oflag=dsync

> > 4096+0 records in

> > 4096+0 records out

> > 268435456 bytes (268 MB) copied, 23.065 s, 11.4 MB/s

> >

> > My thought is that maybe using dsync is forcing the SSD to clean the data

> > or something else before writing to it:

> >

> > http://www.blog.solidstatediskshop.com/2012/how-does-an-ssd-write/

> >

> > Do your drives support fstrim?  It may be worth it to trim before you run

> > and see what results you get.  Other than tuning the SSD / OS to perform

> > better on the back end there isn't much we can do from the gluster

> > perspective on that specific DD w/ the dsync flag.

> >

> > -b

> >

> > >

> > > On Wed, Apr 8, 2015 at 6:44 PM, Vijay Bellur < vbellur@xxxxxxxxxx >

> > wrote:

> > >

> > >

> > >

> > > On 04/08/2015 02:57 PM, Punit Dambiwal wrote:

> > >

> > >

> > >

> > > Hi,

> > >

> > > I am getting very slow throughput in the glusterfs (dead slow...even

> > > SATA is better) ... i am using all SSD in my environment.....

> > >

> > > I have the following setup :-

> > > A. 4* host machine with Centos 7(Glusterfs 3.6.2 | Distributed

> > > Replicated | replica=2)

> > > B. Each server has 24 SSD as bricks…(Without HW Raid | JBOD)

> > > C. Each server has 2 Additional ssd for OS…

> > > D. Network 2*10G with bonding…(2*E5 CPU and 64GB RAM)

> > >

> > > Note :- Performance/Throughput slower then Normal SATA 7200 RPM…even i

> > > am using all SSD in my ENV..

> > >

> > > Gluster Volume options :-

> > >

> > > +++++++++++++++

> > > Options Reconfigured:

> > > performance.nfs.write-behind- window-size: 1024MB

> > > performance.io-thread-count: 32

> > > performance.cache-size: 1024MB

> > > cluster.quorum-type: auto

> > > cluster.server-quorum-type: server

> > > diagnostics.count-fop-hits: on

> > > diagnostics.latency- measurement: on

> > > nfs.disable: on

> > > user.cifs: enable

> > > auth.allow: *

> > > performance.quick-read: off

> > > performance.read-ahead: off

> > > performance.io-cache: off

> > > performance.stat-prefetch: off

> > > cluster.eager-lock: enable

> > > network.remote-dio: enable

> > > storage.owner-uid: 36

> > > storage.owner-gid: 36

> > > server.allow-insecure: on

> > > network.ping-timeout: 0

> > > diagnostics.brick-log-level: INFO

> > > +++++++++++++++++++

> > >

> > > Test with SATA and Glusterfs SSD….

> > > ———————

> > > Dell EQL (SATA disk 7200 RPM)

> > > —-

> > > [root@mirror ~]#

> > > 4096+0 records in

> > > 4096+0 records out

> > > 268435456 bytes (268 MB) copied, 20.7763 s, 12.9 MB/s

> > > [root@mirror ~]# dd if=/dev/zero of=test bs=64k count=4k oflag=dsync

> > > 4096+0 records in

> > > 4096+0 records out

> > > 268435456 bytes (268 MB) copied, 23.5947 s, 11.4 MB/s

> > >

> > > GlsuterFS SSD

> > > —

> > > [root@sv-VPN1 ~]# dd if=/dev/zero of=test bs=64k count=4k oflag=dsync

> > > 4096+0 records in

> > > 4096+0 records out

> > > 268435456 bytes (268 MB) copied, 66.2572 s, 4.1 MB/s

> > > [root@sv-VPN1 ~]# dd if=/dev/zero of=test bs=64k count=4k oflag=dsync

> > > 4096+0 records in

> > > 4096+0 records out

> > > 268435456 bytes (268 MB) copied, 62.6922 s, 4.3 MB/s

> > > ————————

> > >

> > > Please let me know what i should do to improve the performance of my

> > > glusterfs…

> > >

> > >

> > > What is the throughput that you get when you run these commands on the

> > disks

> > > directly without gluster in the picture?

> > >

> > > By running dd with dsync you are ensuring that there is no buffering

> > anywhere

> > > in the stack and that is the reason why low throughput is being observed.

> > >

> > > -Vijay

> > >

> > > -Vijay

> > >

> > >

> > >

> > > _______________________________________________

> > > Gluster-users mailing list

> > > Gluster-users@xxxxxxxxxxx

> > > http://www.gluster.org/mailman/listinfo/gluster-users

> >

>

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users