Re: Initial performance cluster SimpleMessenger vs AsyncMessenger results

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/12/2015 11:12 PM, Gregory Farnum wrote:
On Mon, Oct 12, 2015 at 9:50 AM, Mark Nelson <mnelson@xxxxxxxxxx> wrote:
Hi Guy,

Given all of the recent data on how different memory allocator
configurations improve SimpleMessenger performance (and the effect of memory
allocators and transparent hugepages on RSS memory usage), I thought I'd run
some tests looking how AsyncMessenger does in comparison.  We spoke about
these a bit at the last performance meeting but here's the full write up.
The rough conclusion as of right now appears to be:

1) AsyncMessenger performance is not dependent on the memory allocator like
with SimpleMessenger.

2) AsyncMessenger is faster than SimpleMessenger with TCMalloc + 32MB (ie
default) thread cache.

3) AsyncMessenger is consistently faster than SimpleMessenger for 128K
random reads.

4) AsyncMessenger is sometimes slower than SimpleMessenger when memory
allocator optimizations are used.

5) AsyncMessenger currently uses far more RSS memory than SimpleMessenger.

Here's a link to the paper:

https://drive.google.com/file/d/0B2gTBZrkrnpZS1Q4VktjZkhrNHc/view

Can you clarify these tests a bit more? I can't make the number of
nodes, OSDs, and SSDs work out properly. Were the FIO jobs 256
concurrent ops per job, or in aggregate? Is there any more info that
might suggest why the 128KB rand-read (but not read nor write, and not
4k rand-read) was so asymmetrical?


Hi Greg,

Resending this to the list for posterity as I realized I only sent it you earlier:

- 4 Nodes
- 4 P3700s per node
- 4 OSDs per P3700 (Similar to Intel's setup in Jiangang and Jian's paper)

Each node also acted as an fio client using the librbd engine:

- 4 Nodes
- 2 volumes per node
- 1 fio process per volume
- 32 concurrent IOs per fio process

The 128KB random read results are interesting. In memory allocator tests I saw performance decrease with more threadcache or when TCMalloc was used, and in the past I've seen odd performance characteristics around this IO size. I think it must be a difficult case for the memory allocator to handle consistently well and AsyncMesseneger maybe just sidesteps the problem.

Mark
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux