Poor performance compared to Netapp NAS with small files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

I have very bad performance with glusterFS 3.12.14 with small files especially when working with git repositories.

Here is my configuration :
3 nodes gluster (VMware guest v13 on vSphere 6.5 hosted by Gen8 blades attached to 3PAR SSD RAID5 LUNs), gluster volume type replica 3 with arbiter, SSL enabled, NFS disabled, heartbeat IP between both main nodes.
Trusted storage pool on Debian 9 x64
Client on Debian 8 x64 with native gluster client
Network bandwith verified with iperf between client and each storage node (~900Mb/s)
Disk bandwith verified with dd on each storage node (~90MB/s)
_____________________________________________________________
Volume Name: perftest
Type: Replicate
Volume ID: c60b3744-7955-4058-b276-69d7b97de8aa
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: glusterVM1:/bricks/perftest/brick1/data
Brick2: glusterVM2:/bricks/perftest/brick1/data
Brick3: glusterVM3:/bricks/perftest/brick1/data (arbiter)
Options Reconfigured:
cluster.data-self-heal-algorithm: full
features.trash: off
diagnostics.client-log-level: ERROR
ssl.cipher-list: HIGH:!SSLv2
server.ssl: on
client.ssl: on
transport.address-family: inet
nfs.disable: on
_____________________________________________________________

I made a test script that try several parameters but every test gives similar measures (except for performance.write-behind), ~30s average for a git clone that take only 3s on NAS volume.
_____________________________________________________________
#!/bin/bash

trap "[ -d /mnt/project ] && rm -rf /mnt/project; grep -q /mnt /proc/mounts && umount /mnt; exit" 2

LOG=$(mktemp)
for params in \
  "server.event-threads 5" \
"client.event-threads 5" \
"cluster.lookup-optimize on" \
"cluster.readdir-optimize on" \
"features.cache-invalidation on" \
"features.cache-invalidation-timeout 5" \
"performance.cache-invalidation on" \
"performance.cache-refresh-timeout 5" \
"performance.client-io-threads on" \
"performance.flush-behind on" \
"performance.io-thread-count 6" \
"performance.quick-read on" \
"performance.read-ahead enable" \
"performance.readdir-ahead enable" \
"performance.stat-prefetch on" \
"performance.write-behind on" \
"performance.write-behind-window-size 2MB"; do
  set $params
  echo -n "gluster volume set perftest $1 $2 -> "
  ssh -n glusterVM3 "gluster volume set perftest $1 $2"
done
echo "NAS Reference"
sh -c "time -o $LOG -f '%E %P' git clone git@gitlab.local:grp/project.git /share/nas >/dev/null 2>&1"
cat $LOG
rm -rf /share/nas/project

for params in \
  "server.event-threads 5 6 7" \
  "client.event-threads 5 6 7" \
  "cluster.lookup-optimize on off on" \
  "cluster.readdir-optimize on off on" \
  "features.cache-invalidation on off on" \
  "features.cache-invalidation-timeout 5 10 15 20 30 45 60 90 120" \
  "performance.cache-invalidation on off on" \
  "performance.cache-refresh-timeout 1 5 10 15 20 30 45 60" \
  "performance.client-io-threads on off on" \
  "performance.flush-behind on off on" \
  "performance.io-thread-count 6 7 8 9 10" \
  "performance.quick-read on off on" \
  "performance.read-ahead enable disable enable" \
  "performance.readdir-ahead enable disable enable" \
  "performance.stat-prefetch on off on" \
  "performance.write-behind on off on" \
  "performance.write-behind-window-size 2MB 4MB 8MB 16MB"; do
  set $params
  param=$1
  shift
  for value in $*; do
    echo -en "\nTesting $param=$value -> "
    #ssh -n glusterVM3 "yes | gluster volume stop perftest force; gluster volume set perftest $param $value; gluster volume start perftest"
    ssh -n glusterVM3 "gluster volume set perftest $param $value"
    if mount -t glusterfs -o defaults,direct-io-mode=enable glusterVMa:perftest /mnt; then
      for i in $(seq 1 5); do
        sh -c "time -o $LOG -f '%E %P' git clone git@gitlab.local:grp/project.git /mnt/bench >/dev/null 2>&1"
        cat $LOG
        rm -rf /mnt/bench
      done
      umount /mnt
    else
      echo "*** FAIL"
      exit
    fi
  done
done

rm $LOG
_____________________________________________________________

Output produced by the script
_____________________________________________________________
gluster volume set perftest server.event-threads 5 -> volume set: success
gluster volume set perftest client.event-threads 5 -> volume set: success
gluster volume set perftest cluster.lookup-optimize on -> volume set: success
gluster volume set perftest cluster.readdir-optimize on -> volume set: success
gluster volume set perftest features.cache-invalidation on -> volume set: success
gluster volume set perftest features.cache-invalidation-timeout 5 -> volume set: success
gluster volume set perftest performance.cache-invalidation on -> volume set: success
gluster volume set perftest performance.cache-refresh-timeout 5 -> volume set: success
gluster volume set perftest performance.client-io-threads on -> volume set: success
gluster volume set perftest performance.flush-behind on -> volume set: success
gluster volume set perftest performance.io-thread-count 6 -> volume set: success
gluster volume set perftest performance.quick-read on -> volume set: success
gluster volume set perftest performance.read-ahead enable -> volume set: success
gluster volume set perftest performance.readdir-ahead enable -> volume set: success
gluster volume set perftest performance.stat-prefetch on -> volume set: success
gluster volume set perftest performance.write-behind on -> volume set: success
gluster volume set perftest performance.write-behind-window-size 2MB -> volume set: success
NAS Reference
0:03.59 23%

Testing server.event-threads=5 -> volume set: success
0:29.45 2%
0:27.07 2%
0:24.89 2%
0:24.93 2%
0:24.64 3%

Testing server.event-threads=6 -> volume set: success
0:24.14 3%
0:24.69 2%
0:26.81 2%
0:27.38 2%
0:25.59 2%

Testing server.event-threads=7 -> volume set: success
0:25.34 2%
0:24.14 2%
0:25.92 2%
0:23.62 2%
0:24.76 2%

Testing client.event-threads=5 -> volume set: success
0:24.60 3%
0:29.40 2%
0:34.78 2%
0:33.99 2%
0:33.54 2%

Testing client.event-threads=6 -> volume set: success
0:23.82 3%
0:24.64 2%
0:26.10 3%
0:24.56 2%
0:28.21 2%

Testing client.event-threads=7 -> volume set: success
0:28.15 2%
0:35.19 2%
0:24.03 2%
0:24.79 2%
0:26.55 2%

Testing cluster.lookup-optimize=on -> volume set: success
0:30.67 2%
0:30.49 2%
0:31.52 2%
0:33.13 2%
0:32.41 2%

Testing cluster.lookup-optimize=off -> volume set: success
0:25.82 2%
0:25.59 2%
0:28.24 2%
0:31.90 2%
0:33.52 2%

Testing cluster.lookup-optimize=on -> volume set: success
0:29.33 2%
0:24.82 2%
0:25.93 2%
0:25.36 2%
0:24.89 2%

Testing cluster.readdir-optimize=on -> volume set: success
0:24.98 2%
0:25.03 2%
0:27.47 2%
0:28.13 2%
0:27.41 2%

Testing cluster.readdir-optimize=off -> volume set: success
0:32.54 2%
0:32.50 2%
0:25.56 2%
0:25.21 2%
0:27.39 2%

Testing cluster.readdir-optimize=on -> volume set: success
0:27.68 2%
0:29.33 2%
0:25.50 2%
0:25.17 2%
0:26.00 2%

Testing features.cache-invalidation=on -> volume set: success
0:25.63 2%
0:25.46 3%
0:25.55 3%
0:26.13 2%
0:25.13 2%

Testing features.cache-invalidation=off -> volume set: success
0:27.79 2%
0:25.31 2%
0:24.75 2%
0:27.75 2%
0:32.67 2%

Testing features.cache-invalidation=on -> volume set: success
0:26.34 2%
0:26.60 2%
0:26.32 2%
0:31.05 3%
0:33.58 2%

Testing features.cache-invalidation-timeout=5 -> volume set: success
0:25.89 3%
0:25.07 3%
0:25.49 2%
0:25.44 3%
0:25.47 2%

Testing features.cache-invalidation-timeout=10 -> volume set: success
0:32.34 2%
0:28.27 3%
0:27.41 2%
0:25.17 2%
0:25.56 2%

Testing features.cache-invalidation-timeout=15 -> volume set: success
0:27.79 2%
0:30.58 2%
0:31.63 2%
0:26.71 2%
0:29.69 2%

Testing features.cache-invalidation-timeout=20 -> volume set: success
0:26.62 2%
0:23.76 3%
0:24.17 3%
0:24.99 2%
0:25.31 2%

Testing features.cache-invalidation-timeout=30 -> volume set: success
0:25.75 3%
0:27.34 2%
0:28.38 2%
0:27.15 2%
0:30.91 2%

Testing features.cache-invalidation-timeout=45 -> volume set: success
0:24.77 2%
0:24.81 2%
0:28.22 2%
0:32.56 2%
0:40.81 1%

Testing features.cache-invalidation-timeout=60 -> volume set: success
0:31.97 2%
0:27.14 2%
0:24.53 3%
0:25.48 3%
0:25.27 3%

Testing features.cache-invalidation-timeout=90 -> volume set: success
0:25.24 3%
0:26.83 3%
0:32.74 2%
0:26.82 3%
0:27.69 2%

Testing features.cache-invalidation-timeout=120 -> volume set: success
0:24.50 3%
0:25.43 3%
0:26.21 3%
0:30.09 2%
0:32.24 2%

Testing performance.cache-invalidation=on -> volume set: success
0:28.77 3%
0:37.16 2%
0:42.56 1%
0:26.21 2%
0:27.91 3%

Testing performance.cache-invalidation=off -> volume set: success
0:31.05 2%
0:34.40 2%
0:33.90 2%
0:33.12 2%
0:27.84 3%

Testing performance.cache-invalidation=on -> volume set: success
0:27.17 3%
0:26.73 3%
0:24.61 3%
0:26.36 3%
0:39.90 2%

Testing performance.cache-refresh-timeout=1 -> volume set: success
0:26.83 3%
0:36.17 2%
0:31.37 2%
0:26.12 3%
0:26.46 2%

Testing performance.cache-refresh-timeout=5 -> volume set: success
0:24.95 3%
0:27.33 3%
0:30.77 2%
0:26.77 3%
0:34.62 2%

Testing performance.cache-refresh-timeout=10 -> volume set: success
0:29.36 2%
0:26.04 3%
0:26.21 3%
0:29.47 3%
0:28.67 3%

Testing performance.cache-refresh-timeout=15 -> volume set: success
0:29.26 3%
0:27.31 3%
0:27.15 3%
0:29.74 3%
0:32.70 2%

Testing performance.cache-refresh-timeout=20 -> volume set: success
0:27.99 3%
0:30.13 2%
0:29.39 3%
0:28.59 3%
0:31.30 3%

Testing performance.cache-refresh-timeout=30 -> volume set: success
0:27.47 3%
0:26.68 3%
0:27.09 3%
0:27.08 3%
0:31.72 3%

Testing performance.cache-refresh-timeout=45 -> volume set: success
0:28.83 3%
0:29.21 3%
0:38.75 2%
0:26.15 3%
0:26.76 3%

Testing performance.cache-refresh-timeout=60 -> volume set: success
0:29.64 2%
0:29.71 2%
0:31.41 2%
0:28.35 3%
0:26.26 3%

Testing performance.client-io-threads=on -> volume set: success
0:25.14 3%
0:26.64 3%
0:26.43 3%
0:25.63 3%
0:27.89 3%

Testing performance.client-io-threads=off -> volume set: success
0:31.37 2%
0:33.65 2%
0:28.85 3%
0:28.27 3%
0:26.90 3%

Testing performance.client-io-threads=on -> volume set: success
0:26.12 3%
0:25.92 3%
0:28.30 3%
0:39.20 2%
0:28.45 3%

Testing performance.flush-behind=on -> volume set: success
0:34.83 2%
0:27.33 3%
0:31.30 2%
0:26.40 3%
0:27.49 2%

Testing performance.flush-behind=off -> volume set: success
0:30.64 2%
0:31.60 2%
0:33.22 2%
0:25.67 2%
0:26.85 3%

Testing performance.flush-behind=on -> volume set: success
0:26.75 3%
0:26.67 3%
0:30.52 3%
0:38.60 2%
0:34.69 3%

Testing performance.io-thread-count=6 -> volume set: success
0:30.87 2%
0:34.27 2%
0:34.08 2%
0:28.70 2%
0:32.83 2%

Testing performance.io-thread-count=7 -> volume set: success
0:32.14 2%
0:43.08 1%
0:31.79 2%
0:25.93 3%
0:26.82 2%

Testing performance.io-thread-count=8 -> volume set: success
0:29.89 2%
0:28.69 2%
0:34.19 2%
0:40.00 1%
0:37.42 2%

Testing performance.io-thread-count=9 -> volume set: success
0:26.50 3%
0:26.99 2%
0:27.05 2%
0:32.22 2%
0:31.63 2%

Testing performance.io-thread-count=10 -> volume set: success
0:29.13 2%
0:30.60 2%
0:25.19 2%
0:24.28 3%
0:25.40 3%

Testing performance.quick-read=on -> volume set: success
0:26.40 3%
0:27.37 2%
0:28.03 2%
0:28.07 2%
0:33.47 2%

Testing performance.quick-read=off -> volume set: success
0:30.99 2%
0:27.16 2%
0:25.34 3%
0:27.58 3%
0:27.67 3%

Testing performance.quick-read=on -> volume set: success
0:27.37 2%
0:26.99 3%
0:29.78 2%
0:26.06 2%
0:25.67 2%

Testing performance.read-ahead=enable -> volume set: success
0:24.52 3%
0:26.05 2%
0:32.37 2%
0:30.27 2%
0:25.70 3%

Testing performance.read-ahead=disable -> volume set: success
0:26.98 3%
0:25.54 3%
0:25.55 3%
0:30.78 2%
0:28.07 2%

Testing performance.read-ahead=enable -> volume set: success
0:30.34 2%
0:33.93 2%
0:30.26 2%
0:28.18 2%
0:27.06 3%

Testing performance.readdir-ahead=enable -> volume set: success
0:26.31 3%
0:25.64 3%
0:31.97 2%
0:30.75 2%
0:26.10 3%

Testing performance.readdir-ahead=disable -> volume set: success
0:27.50 3%
0:27.19 3%
0:27.67 3%
0:26.99 3%
0:28.25 3%

Testing performance.readdir-ahead=enable -> volume set: success
0:34.94 2%
0:30.43 2%
0:27.14 3%
0:27.81 2%
0:26.36 3%

Testing performance.stat-prefetch=on -> volume set: success
0:28.55 3%
0:27.10 2%
0:26.64 3%
0:30.84 3%
0:35.45 2%

Testing performance.stat-prefetch=off -> volume set: success
0:29.12 3%
0:36.54 2%
0:26.32 3%
0:29.02 3%
0:27.16 3%

Testing performance.stat-prefetch=on -> volume set: success
0:31.17 2%
0:34.64 2%
0:26.50 3%
0:30.39 2%
0:27.12 3%

Testing performance.write-behind=on -> volume set: success
0:29.77 2%
0:28.00 2%
0:28.98 3%
0:29.83 3%
0:28.87 3%

Testing performance.write-behind=off -> volume set: success
1:11.95 1%
1:06.03 1%
1:07.70 1%
1:30.21 1%
1:08.47 1%

Testing performance.write-behind=on -> volume set: success
0:30.14 2%
0:28.99 2%
0:34.51 2%
0:32.60 2%
0:30.54 2%

Testing performance.write-behind-window-size=2MB -> volume set: success
0:24.74 3%
0:25.71 2%
0:27.49 2%
0:25.78 3%
0:26.35 3%

Testing performance.write-behind-window-size=4MB -> volume set: success
0:34.21 2%
0:27.31 3%
0:28.83 2%
0:28.91 2%
0:25.73 3%

Testing performance.write-behind-window-size=8MB -> volume set: success
0:24.41 3%
0:26.23 2%
0:25.20 3%
0:26.00 2%
0:27.04 2%

Testing performance.write-behind-window-size=16MB -> volume set: success
0:27.92 2%
0:24.69 2%
0:24.67 2%
0:24.13 2%
0:23.55 3%
_____________________________________________________________

If someone has an idea to significantly improve performance I'll be very interested.
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux