High network traffic with performance.readdir-ahead on

Alberto Bengoa <bengoa@xxxxxxxxx> · Mon, 18 Feb 2019 16:58:21 +0000

Hello folks, 
We are working on a migration from Gluster 3.8 to 5.3. Because it has a long migration path, we decided to install new servers running version 5.3 and then migrate the clients updating and pointing them to the new cluster. As a bonus, we still keep a rollback option in case of problems. 

We made our first migration attempt today and, unfortunately, we had to rollback to the old cluster. Since the very few minutes after switching clients from old to the new cluster, we noticed an unusual network traffic on glusters servers (around 320mbps) for that time. 

Near to 08:05 (our first daily peak is 8AM) we reached near to 1gbps during some minutes, and the traffic kept sustaining really high (over 800mbps) up to our second daily peak (at 9AM) when we reached again 1gbps. We decided to rollback the main production servers to old cluster, and kept some servers on the new one. We observed the network traffic going down again to around 300mbps. 

Talking with @nbalacha (Thank you again, man!) on IRC channel he suggested disabling performance.readdir-ahead option and the traffic went instantly down to near to 10mbps. A graph showing all these events can be found here: https://pasteboard.co/I1JR7ck.png

So, the first point here, should performance.readdir-ahead be on by default? Maybe our scenario isn't the best use scenario, because, in fact, we do have hundreds of thousands of directories and it looks to be causing much more problems than benefits. 

Another thing we noticed is that when we point clients running new gluster version (5.3) to the old cluster (version 3.8) we also ran into the high traffic scenario, even already having performance.readdir-ahead switched to "off" (the default option for this version). You can see these high traffics on old cluster here: https://pasteboard.co/I1KdTUd.png . We are aware that having clients and servers running different versions isn't recommended and we are doing that just for debug/tests purposes. 

About our setup, we have ~= 1.5T volume running in Replicated mode (2 servers each cluster). We have around 30 clients mounting these volumes through fuse.glusterfs. 

# gluster volume info of new cluster
Volume Name: X
Type: Replicate
Volume ID: 1d8f7d2d-bda6-4f1c-aa10-6ad29e0b7f5e
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: fs02tmp.x.net:/var/data/glusterfs/x/brick
Brick2: fs01tmp.x.net:/var/data/glusterfs/x/brick
Options Reconfigured:
performance.readdir-ahead: off
client.event-threads: 4
server.event-threads: 4
server.allow-insecure: on
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
performance.io-thread-count: 32
performance.cache-size: 1900MB
performance.write-behind-window-size: 16MB
performance.flush-behind: on
network.ping-timeout: 10

# gluster volume info of old cluster
Volume Name: X
Type: Replicate
Volume ID: 1bd3b5d8-b10f-4c4b-a28a-06ea4cfa1d89
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: fs1.x.net:/var/local/gfs
Brick2: fs2.x.net:/var/local/gfs
Options Reconfigured:
network.ping-timeout: 10
performance.cache-size: 512MB
server.allow-insecure: on
client.bind-insecure: on

I was able to collect a profile from new gluster and pasted here:  https://pastebin.com/ffF8RVH4 . The sad part is that I was unable to reproduce the issue after reenabling performance.readdir-ahead after. Not sure if the clients connected to the cluster were unable to create a workload near to the one that we had this morning. We'll try to recreate the condition that we had soon. 

I can provide more info and tests if you guys need it.

Cheers,
Alberto Bengoa
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users