Hi Zenon,
On Fri, Mar 5, 2021 at 4:52 PM Zenon Panoussis <oracle@xxxxxxxxxxxxxxx> wrote:
Some time ago I created a replica 3 volume using gluster 8.3
with the following topology for the time being:
server1/brick1 ----\ /---- server3/brick3
\____ ADSL 10/1 Mbits ___/
/ <- down up -> \
server2/brick2 ----/ \---- old storage
The connection between the two boxes at each end is 1Gbit.
The distance between the two sides is about 4000 km and
roughly 250ms.
For the past one and a half month I have been running one rsync
on each of the three servers to fetch different parts of a
mail store from "old storage". The mail store consists of
about 1.1 million predominantly small files very unevenly
spread over 6600 directories. Some directories contain 30000+
files, the worst one has 90000+.
Copying simultaneously to all three servers wastes traffic
(what is rsynced to server1 and server2 has to travel down
from old storage and then back up again to server3), but
uses the available bandwidth more efficiently (by using
both directions instead of only down, as the case would be
if I only rsynced to server3 and let the replication flow
down to servers 1 and 2). I did this because, as I mentioned
earlier in the thread "Replication logic", I cannot saturate
any of CPU, disk I/O or even the meager network. This way
the waste of traffic increases the overall speed of copying.
Diagnostics showed that FSYNC had by far the greatest average
latency, followed by MKDIR and CREATE, but they all had
relatively few calls. LOOKUP is what has a huge number of
calls so, even with a moderate average latency, it accounts
for the greatest overall delay, followed by INODELK.
I tested writing both to glusterfs and nfs-ganesha, but
didn't notice any difference between them in speed (however,
nfs-ganesha used seven times more memory than glusterfsd).
Tweaking threads, write-behind, parallel-readdir, cache-size
and inode-lru-limit didn't produce any noticeable difference
either.
Then a few days ago I noticed global-threading at
https://github.com/gluster/glusterfs/issues/532 . It
seemed promising but not merged, but it turned out that
it is actually merged. So last night I upgraded to 9.0
and turned it on. I also dumped nfs-ganesha. With that,
my configuration ended up like this:
Volume Name: gv0
Type: Replicate
Volume ID: 2786efab-9178-4a9a-a525-21d6f1c94de9
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: node1:/gfs/gv0
Brick2: node2:/gfs/gv0
Brick3: node3:/gfs/gv0
Options Reconfigured:
cluster.granular-entry-heal: enable
network.ping-timeout: 20
network.frame-timeout: 60
performance.write-behind: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
features.bitrot: off
features.scrub: Inactive
features.scrub-freq: weekly
performance.io-thread-count: 32
features.selinux: off
client.event-threads: 3
server.event-threads: 3
cluster.min-free-disk: 1%
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.cache-invalidation: on
cluster.self-heal-daemon: enable
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
performance.cache-size: 256MB
network.inode-lru-limit: 131072
performance.parallel-readdir: on
performance.qr-cache-timeout: 600
performance.nl-cache-positive-entry: on
performance.nfs.io-threads: on
config.global-threading: on
performance.iot-pass-through: on
In the short time it's been running since, I saw no
subjectively noticeable increase in the speed of
writing, but I do see some increase in the speed of
file listing (that is, the speed at which rsync
without --whole-file will run through preexisting
files while reporting "file X is uptodate"). This
is presumably stat working faster because of thread
parallelisation, but I'm only guessing. The network
still does not get saturated except during the
transfer of some occasional big (5MB+) files. So
far I have seen no negative impact of turning global
threading on compared to previously.
Any and all ideas on how to improve this setup (other
than physically) are most welcome.
The main issue with the global threading is that it's not regularly tested, so it could have unknown bugs. Besides that are you using it both on client and bricks, or only on the client ?
I think the main problem with rsync is that it's mostly a sequential program that does many small requests. In this case it's hard to saturate the network because the roundtrip latency of sequential operations is what dominates.
To improve that you could try to run several rsync processes in parallel. That should make better use of the bandwidth. Gluster normally works better with parallel operations. It's not so good with single sequential operations.
Another thing you could try is to increase the timeout of kernel cache using "entry-timeout" and "attribute-timeout" mount options. By default they are set to 1. A higher value could help reduce the number of lookups. However this could cause some delays detecting changes or even create inconsistencies for worst cases. This should only be used when there's a single fuse mount using the volume. As the global threading feature, using higher values here has not been tested, so it could have other unexpected problems.
Regards,
Xavi
________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users