Re: GFS performance under heavy traffic

David Cunningham <dcunningham@xxxxxxxxxxxxx> · Sat, 21 Dec 2019 10:49:29 +1300

Hi Stahil,

Ah, that is an important point. One of the nodes is not accessible from the client, and we assumed that it only needed to reach the GFS node that was mounted so didn't think anything of it.

We will try making all nodes accessible, as well as "direct-io-mode=disable".

Thank you.

On Sat, 21 Dec 2019 at 10:29, Strahil Nikolov <hunter86_bg@xxxxxxxxx> wrote:

        Actually I haven't clarified myself.
FUSE mounts on the client side is connecting directly to all bricks consisted of the volume.
If for some reason (bad routing, firewall blocked) there could be cases where the client can reach 2 out of 3 bricks and this can constantly cause healing to happen (as one of the bricks is never updated) which will degrade the performance and cause excessive network usage.
As your attachment is from one of the gluster nodes, this could be the case.

Best Regards,
Strahil Nikolov

                    В петък, 20 декември 2019 г., 01:49:56 ч. Гринуич+2, David Cunningham <dcunningham@xxxxxxxxxxxxx> написа:

                Hi Strahil,

The chart attached to my original email is taken from the GFS server.

I'm not sure what you mean by accessing all bricks simultaneously. We've mounted it from the client like this:
gfs1:/gvol0 /mnt/glusterfs/ glusterfs defaults,direct-io-mode=disable,_netdev,backupvolfile-server=gfs2,fetch-attempts=10 0 0

Should we do something different to access all bricks simultaneously?

Thanks for your help!

On Fri, 20 Dec 2019 at 11:47, Strahil Nikolov <hunter86_bg@xxxxxxxxx> wrote:

        I'm not sure if you did measure the traffic from client side (tcpdump on a client machine) or from Server side.

In both cases , please verify that the client accesses all bricks simultaneously, as this can cause unnecessary heals.

Have you thought about upgrading to v6? There are some enhancements in v6 which could be beneficial.

Yet, it is indeed strange that so much traffic is generated with FUSE.

Another aproach is to test with NFSGanesha which suports pNFS and can natively speak with Gluster, which cant bring you closer to the previous setup and also provide some extra performance.

Best Regards,
Strahil Nikolov

                    В четвъртък, 19 декември 2019 г., 02:28:55 ч. Гринуич+2, David Cunningham <dcunningham@xxxxxxxxxxxxx> написа:

                Hi Raghavendra and Strahil,

We are using GFS version 5.6-1.el7 from the CentOS repository. Unfortunately we can't modify the application and it expects to read and write from a normal filesystem.

There's around 25GB of data being written during a business day, so over 10 hours that's around 0.7 MBps, which has me mystified as to how it can generate 114MBps of network traffic. Granted we have read traffic as well, but still. The chart shows much more inbound traffic to the GFS server than outbound, suggesting the problem is with data writes.

Is it possible with GFS to not check with the other nodes when reading? Our data is mostly static and we don't require 100% guarantee that the data is up-to-date when reading.

Thanks for any assistance.

On Wed, 18 Dec 2019 at 16:39, Raghavendra Gowdappa <rgowdapp@xxxxxxxxxx> wrote:
What version of Glusterfs are you using? Though, not sure what's the root cause of your problem, just wanted to point out a bug with read-ahead which would cause read-amplification over network [1][2], which should be fixed in recent versions.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1214489
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1393419

On Wed, Dec 18, 2019 at 2:50 AM David Cunningham <dcunningham@xxxxxxxxxxxxx> wrote:
Hello,

We switched a production system to using GFS instead of NFS at the weekend, however it didn't go well on Monday when full load hit. The application started crashing regularly and we had to revert to NFS. It seems that the problem was high network traffic used by GFS.

We've two GFS nodes plus one arbiter node, each about 1.3ms latency from each other. Attached is a chart of network traffic on one of the GFS nodes. We see that it saturated the 1Gbps link before we reverted to NFS at 15:10.

The question is, why does GFS use so much network traffic and is there anything we can do about it? NFS traffic doesn't exceed 4MBps, so 120MBps for GFS seems awfully high.

It would also be good to have faster read performance from GFS, but that's another issue.

Thanks in advance for any assistance.

-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
________

Community Meeting Calendar:

APAC Schedule -

Every 2nd and 4th Tuesday at 11:30 AM IST

Bridge: https://bluejeans.com/441850968

NA/EMEA Schedule -

Every 1st and 3rd Tuesday at 01:00 PM EDT

Bridge: https://bluejeans.com/441850968

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

https://lists.gluster.org/mailman/listinfo/gluster-users

-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/441850968

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782

-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/441850968

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users