Re: Unreasonably poor performance of replicated volumes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Guess you went through user lists and tried something like this already http://lists.gluster.org/pipermail/gluster-users/2018-April/033811.html
I have a same exact setup and below is as far as it went after months of trail and error.
We all have somewhat same setup and same issue with this - you can find same post as yours on the daily basis.

On Wed, Apr 11, 2018 at 3:03 PM, Anastasia Belyaeva <anastasia.blv@xxxxxxxxx> wrote:
Hello everybody!

I have 3 gluster servers (gluster 3.12.6, Centos 7.2; those are actually virtual machines located on 3 separate physical XenServer7.1 servers) 

They are all connected via infiniband network. Iperf3 shows around 23 Gbit/s network bandwidth between each 2 of them.

Each server has 3 HDD put into a stripe*3 thin pool (LVM2) with logical volume created on top of it, formatted with xfs. Gluster top reports the following throughput:

root@fsnode2 ~ $ gluster volume top r3vol write-perf bs 4096 count 524288 list-cnt 0
Brick: fsnode2.ibnet:/data/glusterfs/r3vol/brick1/brick
Throughput 631.82 MBps time 3.3989 secs
Brick: fsnode6.ibnet:/data/glusterfs/r3vol/brick1/brick
Throughput 566.96 MBps time 3.7877 secs
Brick: fsnode4.ibnet:/data/glusterfs/r3vol/brick1/brick
Throughput 546.65 MBps time 3.9285 secs

root@fsnode2 ~ $ gluster volume top r2vol write-perf bs 4096 count 524288 list-cnt 0
Brick: fsnode2.ibnet:/data/glusterfs/r2vol/brick1/brick
Throughput 539.60 MBps time 3.9798 secs
Brick: fsnode4.ibnet:/data/glusterfs/r2vol/brick1/brick
Throughput 580.07 MBps time 3.7021 secs

And two pure replicated ('replica 2' and 'replica 3') volumes. *The 'replica 2' volume is for testing purpose only.
Volume Name: r2vol
Type: Replicate
Volume ID: 4748d0c0-6bef-40d5-b1ec-d30e10cfddd9
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: fsnode2.ibnet:/data/glusterfs/r2vol/brick1/brick
Brick2: fsnode4.ibnet:/data/glusterfs/r2vol/brick1/brick
Options Reconfigured:
nfs.disable: on
 
Volume Name: r3vol
Type: Replicate
Volume ID: b0f64c28-57e1-4b9d-946b-26ed6b499f29
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: fsnode2.ibnet:/data/glusterfs/r3vol/brick1/brick
Brick2: fsnode4.ibnet:/data/glusterfs/r3vol/brick1/brick
Brick3: fsnode6.ibnet:/data/glusterfs/r3vol/brick1/brick
Options Reconfigured:
nfs.disable: on


Client is also gluster 3.12.6, Centos 7.3 virtual machine, FUSE mount 
root@centos7u3-nogdesktop2 ~ $ mount |grep gluster
gluster-host.ibnet:/r2vol on /mnt/gluster/r2 type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
gluster-host.ibnet:/r3vol on /mnt/gluster/r3 type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)


The problem is that there is a significant performance loss with smaller block sizes. For example: 

4K block size
[replica 3 volume]
root@centos7u3-nogdesktop2 ~ $ dd if=/dev/zero of=/mnt/gluster/r3/file$RANDOM bs=4096 count=262144
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 11.2207 s, 95.7 MB/s

[replica 2 volume]
root@centos7u3-nogdesktop2 ~ $ dd if=/dev/zero of=/mnt/gluster/r2/file$RANDOM bs=4096 count=262144
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 12.0149 s, 89.4 MB/s

512K block size
[replica 3 volume]
root@centos7u3-nogdesktop2 ~ $ dd if=/dev/zero of=/mnt/gluster/r3/file$RANDOM bs=512K count=2048
2048+0 records in
2048+0 records out
1073741824 bytes (1.1 GB) copied, 5.27207 s, 204 MB/s

[replica 2 volume]
root@centos7u3-nogdesktop2 ~ $ dd if=/dev/zero of=/mnt/gluster/r2/file$RANDOM bs=512K count=2048
2048+0 records in
2048+0 records out
1073741824 bytes (1.1 GB) copied, 4.22321 s, 254 MB/s

With bigger block size It's still not where I expect it to be, but at least it starts to make some sense.

I've been trying to solve this for a very long time with no luck. 
I've already tried both kernel tuning (different 'tuned' profiles and the ones recommended in the "Linux Kernel Tuning" section) and tweaking gluster volume options, including write-behind/flush-behind/write-behind-window-size.
The latter, to my surprise, didn't make any difference. 'Cause at first I thought it was the buffering issue but it turns out it does buffer writes, just not very efficient (well at least what it looks like in the gluster profile output)

root@fsnode2 ~ $ gluster volume profile r3vol info clear
...
Cleared stats.

root@centos7u3-nogdesktop2 ~ $ dd if=/dev/zero of=/mnt/gluster/r3/file$RANDOM bs=4096 count=262144
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 10.9743 s, 97.8 MB/s
 
root@fsnode2 ~ $ gluster volume profile r3vol info
Brick: fsnode2.ibnet:/data/glusterfs/r3vol/brick1/brick
-------------------------------------------------------
Cumulative Stats:
   Block Size:               4096b+                8192b+               16384b+
 No. of Reads:                    0                     0                     0
No. of Writes:                 1576                  4173                 19605
   Block Size:              32768b+               65536b+              131072b+
 No. of Reads:                    0                     0                     0
No. of Writes:                 7777                  1847                   657
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              1     RELEASE
      0.00      18.00 us      18.00 us      18.00 us              1      STATFS
      0.00      20.50 us      11.00 us      30.00 us              2       FLUSH
      0.00      22.50 us      17.00 us      28.00 us              2    FINODELK
      0.01      76.50 us      65.00 us      88.00 us              2    FXATTROP
      0.01     177.00 us     177.00 us     177.00 us              1      CREATE
      0.02      56.14 us      23.00 us     128.00 us              7      LOOKUP
      0.02     259.00 us      20.00 us     498.00 us              2     ENTRYLK
     99.94      59.23 us      17.00 us   10914.00 us          35635       WRITE
    Duration: 38 seconds
   Data Read: 0 bytes
Data Written: 1073741824 bytes
Interval 0 Stats:
   Block Size:               4096b+                8192b+               16384b+
 No. of Reads:                    0                     0                     0
No. of Writes:                 1576                  4173                 19605
   Block Size:              32768b+               65536b+              131072b+
 No. of Reads:                    0                     0                     0
No. of Writes:                 7777                  1847                   657
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              1     RELEASE
      0.00      18.00 us      18.00 us      18.00 us              1      STATFS
      0.00      20.50 us      11.00 us      30.00 us              2       FLUSH
      0.00      22.50 us      17.00 us      28.00 us              2    FINODELK
      0.01      76.50 us      65.00 us      88.00 us              2    FXATTROP
      0.01     177.00 us     177.00 us     177.00 us              1      CREATE
      0.02      56.14 us      23.00 us     128.00 us              7      LOOKUP
      0.02     259.00 us      20.00 us     498.00 us              2     ENTRYLK
     99.94      59.23 us      17.00 us   10914.00 us          35635       WRITE
    Duration: 38 seconds
   Data Read: 0 bytes
Data Written: 1073741824 bytes
Brick: fsnode6.ibnet:/data/glusterfs/r3vol/brick1/brick
-------------------------------------------------------
Cumulative Stats:
   Block Size:               4096b+                8192b+               16384b+
 No. of Reads:                    0                     0                     0
No. of Writes:                 1576                  4173                 19605
   Block Size:              32768b+               65536b+              131072b+
 No. of Reads:                    0                     0                     0
No. of Writes:                 7777                  1847                   657
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              1     RELEASE
      0.00      33.00 us      33.00 us      33.00 us              1      STATFS
      0.00      22.50 us      13.00 us      32.00 us              2     ENTRYLK
      0.00      32.00 us      26.00 us      38.00 us              2       FLUSH
      0.01      47.50 us      16.00 us      79.00 us              2    FINODELK
      0.01     157.00 us     157.00 us     157.00 us              1      CREATE
      0.01      92.00 us      70.00 us     114.00 us              2    FXATTROP
      0.03      72.57 us      39.00 us     121.00 us              7      LOOKUP
     99.94      47.97 us      15.00 us    1598.00 us          35635       WRITE
    Duration: 38 seconds
   Data Read: 0 bytes
Data Written: 1073741824 bytes
Interval 0 Stats:
   Block Size:               4096b+                8192b+               16384b+
 No. of Reads:                    0                     0                     0
No. of Writes:                 1576                  4173                 19605
   Block Size:              32768b+               65536b+              131072b+
 No. of Reads:                    0                     0                     0
No. of Writes:                 7777                  1847                   657
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              1     RELEASE
      0.00      33.00 us      33.00 us      33.00 us              1      STATFS
      0.00      22.50 us      13.00 us      32.00 us              2     ENTRYLK
      0.00      32.00 us      26.00 us      38.00 us              2       FLUSH
      0.01      47.50 us      16.00 us      79.00 us              2    FINODELK
      0.01     157.00 us     157.00 us     157.00 us              1      CREATE
      0.01      92.00 us      70.00 us     114.00 us              2    FXATTROP
      0.03      72.57 us      39.00 us     121.00 us              7      LOOKUP
     99.94      47.97 us      15.00 us    1598.00 us          35635       WRITE
    Duration: 38 seconds
   Data Read: 0 bytes
Data Written: 1073741824 bytes
Brick: fsnode4.ibnet:/data/glusterfs/r3vol/brick1/brick
-------------------------------------------------------
Cumulative Stats:
   Block Size:               4096b+                8192b+               16384b+
 No. of Reads:                    0                     0                     0
No. of Writes:                 1576                  4173                 19605
   Block Size:              32768b+               65536b+              131072b+
 No. of Reads:                    0                     0                     0
No. of Writes:                 7777                  1847                   657
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              1     RELEASE
      0.00      58.00 us      58.00 us      58.00 us              1      STATFS
      0.00      38.00 us      38.00 us      38.00 us              2     ENTRYLK
      0.01      59.00 us      32.00 us      86.00 us              2       FLUSH
      0.01      81.00 us      33.00 us     129.00 us              2    FINODELK
      0.01      91.50 us      73.00 us     110.00 us              2    FXATTROP
      0.01     239.00 us     239.00 us     239.00 us              1      CREATE
      0.04     103.14 us      63.00 us     210.00 us              7      LOOKUP
     99.92      52.99 us      16.00 us   11289.00 us          35635       WRITE
    Duration: 38 seconds
   Data Read: 0 bytes
Data Written: 1073741824 bytes
Interval 0 Stats:
   Block Size:               4096b+                8192b+               16384b+
 No. of Reads:                    0                     0                     0
No. of Writes:                 1576                  4173                 19605
   Block Size:              32768b+               65536b+              131072b+
 No. of Reads:                    0                     0                     0
No. of Writes:                 7777                  1847                   657
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              1     RELEASE
      0.00      58.00 us      58.00 us      58.00 us              1      STATFS
      0.00      38.00 us      38.00 us      38.00 us              2     ENTRYLK
      0.01      59.00 us      32.00 us      86.00 us              2       FLUSH
      0.01      81.00 us      33.00 us     129.00 us              2    FINODELK
      0.01      91.50 us      73.00 us     110.00 us              2    FXATTROP
      0.01     239.00 us     239.00 us     239.00 us              1      CREATE
      0.04     103.14 us      63.00 us     210.00 us              7      LOOKUP
     99.92      52.99 us      16.00 us   11289.00 us          35635       WRITE
    Duration: 38 seconds
   Data Read: 0 bytes
Data Written: 1073741824 bytes


At this point I'm officially run out of idea where to look next. So any help, suggestions or pointers are highly appreciated! 

--
Best regards,
Anastasia Belyaeva






_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux