Re: GFS performance under heavy traffic

Strahil <hunter86_bg@xxxxxxxxx> · Tue, 24 Dec 2019 18:51:28 +0200

Hi David,
On Dec 24, 2019 02:47, David Cunningham <dcunningham@xxxxxxxxxxxxx> wrote:

>

> Hello,

>

> In testing we found that actually the GFS client having access to all 3 nodes made no difference to performance. Perhaps that's because the 3rd node that wasn't accessible from the client before was the arbiter node?

It makes sense, as no data is being generated towards the arbiter.

> Presumably we shouldn't have an arbiter node listed under backupvolfile-server when mounting the filesystem? Since it doesn't store all the data surely it can't be used to serve the data.
I have my arbiter defined as last backup and no issues so far. At least the admin can easily identify the bricks from the mount options.
> We did have direct-io-mode=disable already as well, so that wasn't a factor in the performance problems.
Have you checked if the client vedsion ia not too old.

Also you can check the cluster's  operation cersion:

# gluster volume get all cluster.max-op-version

# gluster volume get all cluster.op-version
Cluster's op version should be at max-op-version.
In my mind come 2  options:

A) Upgrade to latest GLUSTER v6 or even v7 ( I know it won't be easy) and then set the op version to highest possible.

# gluster volume get all cluster.max-op-version

# gluster volume get all cluster.op-version
B)  Deploy a NFS Ganesha server and connect the client over NFS v4.2 (and control the parallel connections from Ganesha).
Can you provide your  Gluster volume's  options?

'gluster volume get <VOLNAME>  all'
> Thanks again for any advice.

>

>

>

> On Mon, 23 Dec 2019 at 13:09, David Cunningham <dcunningham@xxxxxxxxxxxxx> wrote:

>>

>> Hi Strahil,

>>

>> Thanks for that. We do have one backup server specified, but will add the second backup as well.

>>

>>

>> On Sat, 21 Dec 2019 at 11:26, Strahil <hunter86_bg@xxxxxxxxx> wrote:

>>>

>>> Hi David,

>>>

>>> Also consider using the  mount option to specify backup server via 'backupvolfile-server=server2:server3' (you can define more but I don't thing replica volumes  greater that 3 are usefull (maybe  in some special cases).

>>>

>>> In such way, when the primary is lost, your client can reach a backup one without disruption.

>>>

>>> P.S.: Client may 'hang' - if the primary server got rebooted ungracefully - as the communication must timeout before FUSE addresses the next server. There is a special script for  killing gluster processes in '/usr/share/gluster/scripts' which can be used  for  setting up a systemd service to do that for you on shutdown.

>>>

>>> Best Regards,

>>> Strahil Nikolov

>>>

>>> On Dec 20, 2019 23:49, David Cunningham <dcunningham@xxxxxxxxxxxxx> wrote:

>>>>

>>>> Hi Stahil,

>>>>

>>>> Ah, that is an important point. One of the nodes is not accessible from the client, and we assumed that it only needed to reach the GFS node that was mounted so didn't think anything of it.

>>>>

>>>> We will try making all nodes accessible, as well as "direct-io-mode=disable".

>>>>

>>>> Thank you.

>>>>

>>>>

>>>> On Sat, 21 Dec 2019 at 10:29, Strahil Nikolov <hunter86_bg@xxxxxxxxx> wrote:

>>>>>

>>>>> Actually I haven't clarified myself.

>>>>> FUSE mounts on the client side is connecting directly to all bricks consisted of the volume.

>>>>> If for some reason (bad routing, firewall blocked) there could be cases where the client can reach 2 out of 3 bricks and this can constantly cause healing to happen (as one of the bricks is never updated) which will degrade the performance and cause excessive network usage.

>>>>> As your attachment is from one of the gluster nodes, this could be the case.

>>>>>

>>>>> Best Regards,

>>>>> Strahil Nikolov

>>>>>

>>>>> В петък, 20 декември 2019 г., 01:49:56 ч. Гринуич+2, David Cunningham <dcunningham@xxxxxxxxxxxxx> написа:

>>>>>

>>>>>

>>>>> Hi Strahil,

>>>>>

>>>>> The chart attached to my original email is taken from the GFS server.

>>>>>

>>>>> I'm not sure what you mean by accessing all bricks simultaneously. We've mounted it from the client like this:

>>>>> gfs1:/gvol0 /mnt/glusterfs/ glusterfs defaults,direct-io-mode=disable,_netdev,backupvolfile-server=gfs2,fetch-attempts=10 0 0

>>>>>

>>>>> Should we do something different to access all bricks simultaneously?

>>>>>

>>>>> Thanks for your help!

>>>>>

>>>>>

>>>>> On Fri, 20 Dec 2019 at 11:47, Strahil Nikolov <hunter86_bg@xxxxxxxxx> wrote:

>>>>>>

>>>>>> I'm not sure if you did measure the traffic from client side (tcpdump on a client machine) or from Server side.

>>>>>>

>>>>>> In both cases , please verify that the client accesses all bricks simultaneously, as this can cause unnecessary heals.

>>>>>>

>>>>>> Have you thought about upgrading to v6? There are some enhancements in v6 which could be beneficial.

>>>>>>

>>>>>> Yet, it is indeed strange that so much traffic is generated with FUSE.

>>>>>>

>>>>>> Another aproach is to test with NFSGanesha which suports pNFS and can natively speak with Gluster, which cant bring you closer to the previous setup and also provide some extra performance.

>>>>>>

>>>>>>

>>>>>> Best Regards,

>>>>>> Strahil Nikolov

>>>>>>

>>>>>>

>>>>>>

>>

>>

>> -- 

>> David Cunningham, Voisonics Limited

>> http://voisonics.com/

>> USA: +1 213 221 1092

>> New Zealand: +64 (0)28 2558 3782

>

>

>

> -- 

> David Cunningham, Voisonics Limited

> http://voisonics.com/

> USA: +1 213 221 1092

> New Zealand: +64 (0)28 2558 3782
Best Regards,

Strahil Nikolov
________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/441850968

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users