Re: Performance experiments with io-stats translator

Xavier Hernandez <xhernandez@xxxxxxxxxx> · Wed, 7 Jun 2017 08:29:46 +0200

Hi Krutika,

On 06/06/17 13:35, Krutika Dhananjay wrote:
Hi,

As part of identifying performance bottlenecks within gluster stack for
VM image store use-case, I loaded io-stats at multiple points on the
client and brick stack and ran randrd test using fio from within the
hosted vms in parallel.

Before I get to the results, a little bit about the configuration ...

3 node cluster; 1x3 plain replicate volume with group virt settings,
direct-io.
3 FUSE clients, one per node in the cluster (which implies reads are
served from the replica that is local to the client).

io-stats was loaded at the following places:
On the client stack: Above client-io-threads and above protocol/client-0
(the first child of AFR).
On the brick stack: Below protocol/server, above and below io-threads
and just above storage/posix.

Based on a 60-second run of randrd test and subsequent analysis of the
stats dumped by the individual io-stats instances, the following is what
I found:

_*Translator Position*_*                       *_*Avg Latency of READ
fop as seen by this translator*_

1. parent of client-io-threads                1666us

∆ (1,2) = 50us

2. parent of protocol/client-0                1616us

∆(2,3) = 1453us

----------------- end of client stack ---------------------
----------------- beginning of brick stack -----------

3. child of protocol/server                   163us

∆(3,4) = 7us

4. parent of io-threads                        156us

∆(4,5) = 20us

5. child-of-io-threads                          136us

∆ (5,6) = 11us

6. parent of storage/posix                   125us
...
---------------- end of brick stack ------------------------

So it seems like the biggest bottleneck here is a combination of the
network + epoll, rpc layer?
I must admit I am no expert with networks, but I'm assuming if the
client is reading from the local brick, then
even latency contribution from the actual network won't be much, in
which case bulk of the latency is coming from epoll, rpc layer, etc at
both client and brick end? Please correct me if I'm wrong.

I will, of course, do some more runs and confirm if the pattern is
consistent.

very interesting. These results are similar to what I also observed when 
doing some ec tests.

My personal feeling is that there's high serialization and/or contention 
in the network layer caused by mutexes, but I don't have data to support 
that.

Xavi

-Krutika

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-devel