Re: Ceph allocator and performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,
if you look in the archive you'll see I posted something similiar about 2 months ago.

You can try something experimenting with
1) stock binaries - tcmalloc
2) LD_PRELOADed jemalloc
3) ceph recompiled with neither (glibc malloc)
4) ceph recompiled with jemalloc (?)

We simply recompiled ceph binaries without tcmalloc and CPU usage went down considerably and latencies improved. I can't vouch for no adverse effects in the long run, though. We went back to tcmalloc a while ago while hunting down a problem (to eliminate variables) but it's just temporary and we are going to switch back. Disabling tcmalloc saved us a lot of cores.

There is also a variable:
TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES

Some people reported upping this variable helps alleviate the issue, but it didn't work for us, which might be a bug in our version of tcmalloc, or a Ceph bug. Touching it caused serious performance problems right away. It's not the ideal solution anyway.

Some references:
http://tracker.ceph.com/issues/12516
http://events.linuxfoundation.org/sites/events/files/slides/optimizing_ceph_flash.pdf
https://www.mail-archive.com/search?l=ceph-devel@xxxxxxxxxxxxxxx&q=subject:%22Re%3A+Performance+variation+across+RBD+clients+on+different+pools+in+all+SSD+setup+-+tcmalloc+issue%22&o=newest&f=1

Jan


> On 11 Aug 2015, at 17:13, Межов Игорь Александрович <megov@xxxxxxxxxx> wrote:
> 
> Hi!
> 
> We got some strange performance results when running random read fio test on our test Hammer cluster.
> 
> When we run fio-rbd (4k, randread, 8 jobs, QD=32, 500Gb rbd image) at first time (pagecache is cold/empty) 
> we got ~12kiops sustained performance. It is quite resonable value, as 12kiops/34osd = 352iops per disk. 
> This is rather normal value per 10k sas disk. As most of the data have really read from platters, we also got 
> high iowait - ~45% and average user cpu activity (~35%).
> 
> But when we run the same test second time, some data already stay in a pagecache and can be acessed
> faster, and yes, we got ~25kiops. We have low iowait (~1-3%), but surprisingly high user cpu activity >70%
> 
> Perf top shows us, than most calls are in tcmalloc library:
>  19,61%  libtcmalloc.so.4.2.2              [.] tcmalloc::CentralFreeList::FetchFromOneSpans(int, void**, void**)
>  15,53%  libtcmalloc.so.4.2.2              [.] tcmalloc::SLL_Next(void*)
>   9,03%  libtcmalloc.so.4.2.2              [.] TCMalloc_PageMap3<35>::get(unsigned long) const
>   6,71%  libtcmalloc.so.4.2.2              [.] tcmalloc::CentralFreeList::ReleaseToSpans(void*)
>   1,59%  libtcmalloc.so.4.2.2              [.] tcmalloc::CentralFreeList::ReleaseListToSpans(void*)
>   1,58%  libtcmalloc.so.4.2.2              [.] tcmalloc::SLL_PopRange(void**, int, void**, void**)
>   1,42%  libtcmalloc.so.4.2.2              [.] tcmalloc::PageHeap::GetDescriptor(unsigned long) const
>   1,03%  libtcmalloc.so.4.2.2              [.] 0x0000000000060589
>   0,91%  libtcmalloc.so.4.2.2              [.] tcmalloc::ThreadCache::Scavenge()
>   0,82%  libtcmalloc.so.4.2.2              [.] tcmalloc::DLL_Remove(tcmalloc::Span*)
>   0,80%  libtcmalloc.so.4.2.2              [.] tcmalloc::ThreadCache::IncreaseCacheLimitLocked()
>   0,75%  libtcmalloc.so.4.2.2              [.] tcmalloc::Static::pageheap()
>   0,69%  libtcmalloc.so.4.2.2              [.] PackedCache<35, unsigned long>::GetOrDefault(unsigned long, unsigned long) const
>   0,51%  libpthread-2.19.so                [.] __pthread_mutex_unlock_usercnt        
> 
> 
> Running the same test over an RBD image in SSD pool gives the same 25-30kiops, while every DC S3700 SSD
> we used in ssd pool are easily performing >50k iops. I think, that 25-30kiops limit we got are due to tcmalloc 
> inefficiency. 
> 
> What we can do to improve our results? Is there are some tuning of tcmalloc, or maybe compiling ceph
> with jemalloc will give better results? Have you any thoughts?
> 
> Our small test Hammer install:
> - Debian Jessie;
> - Ceph Hammer 0.94.2 self-built from sources (tcmalloc)
> - 1xE5-2670 + 128Gb RAM
> - 2 nodes shared with mons, system and mon DB are on separate SAS mirror;
> - 17 OSD on each node, SAS 10k;
> - 2 Intel DC S3700 200Gb SSD for journalling on each node
> - 2 Intel DC S3700 400Gb SSD for separate SSD pool
> - 10Gbit interconnect, shared public and cluster metwork, MTU9100
> - 10Gbit client host, fio 2.2.7 compiled with RBD engine
> 
> Megov Igor
> CIO, Yuterra
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux