Re: Restarting OSD leads to lower CPU usage

Jan Schermer <jan@xxxxxxxxxxx> · Thu, 11 Jun 2015 21:22:58 +0200

TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES works, or at least seems to, just nothing positive. This is on a Centos 6-ish distro.
I can’t really upgrade anything easily because of support, and we still run 0.67.12 in production, so that’s a no-go.
I know upgrading to Giant is the best way to achieve more performance, but we’re not ready for that yet either (but working on it :))
I’d expect the tcmalloc issue to manifest almost immediately? There are thousands of threads, hundreds of connections - surely it would manifest sooner? People were seeing regressions with just two clients in benchmarks so I thought we are operating with b0rked thread cache constantly…

for the record, preloading jemalloc ends with sigsegv within a few minutes, if anybody wanted to know… :)

Jan

> On 11 Jun 2015, at 21:14, Somnath Roy <Somnath.Roy@xxxxxxxxxxx> wrote:
> 
> Yeah ! Then it is the tcmalloc issue..
> If you are using the version coming with OS , the TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES won't do anything.
> Try building the latest tcmalloc and set the env variable and see if it improves or not.
> Also, you can try with latest ceph build with jemalloc enabled if you have a test cluster.
> 
> Thanks & Regards
> Somnath
> 
> -----Original Message-----
> From: Jan Schermer [mailto:jan@xxxxxxxxxxx] 
> Sent: Thursday, June 11, 2015 12:10 PM
> To: Somnath Roy
> Cc: Dan van der Ster; ceph-users@xxxxxxxxxxxxxx
> Subject: Re:  Restarting OSD leads to lower CPU usage
> 
> Hi,
> I looked at it briefly before leaving, tcmalloc was at the top. I can provide a full listing tomorrow if it helps.
> 
> 12.80%  libtcmalloc.so.4.1.0  [.] tcmalloc::CentralFreeList::FetchFromSpans()
>  8.40%  libtcmalloc.so.4.1.0  [.] tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int)
>  7.40%  [kernel]              [k] futex_wake
>  6.36%  libtcmalloc.so.4.1.0  [.] tcmalloc::CentralFreeList::ReleaseToSpans(void*)
>  6.09%  [kernel]              [k] futex_requeue
> 
> Not much else to see. We tried setting the venerable TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES, but it only got much much worse (default 16MB, tried 8MB and up to 512MB, it was unusably slow immediately after start). We haven’t tried upgrading tcmalloc, though...
> 
> We only use Ceph for RBD with OpenStack, block size is the default (4MB).
> I tested different block sizes previously, and I got the best results from 8MB blocks (and I was benchmarking 4K random direct/sync writes) - strange, I think…
> 
> I increased fdcache to 120000 (which should be enough for all objects on the OSD), and I will compare how it behaves tomorrow.
> 
> Thanks a lot
> 
> Jan
> 
>> On 11 Jun 2015, at 20:59, Somnath Roy <Somnath.Roy@xxxxxxxxxxx> wrote:
>> 
>> Yeah, perf top will help you a lot..
>> 
>> Some guess:
>> 
>> 1. If your block size is small 4-16K range, most probably you are hitting the tcmalloc issue. 'perf top' will show up with lot of tcmalloc traces in that case.
>> 
>> 2. fdcache should save you some cpu but I don't see it will be that significant.
>> 
>> Thanks & Regards
>> Somnath
>> 
>> -----Original Message-----
>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf 
>> Of Jan Schermer
>> Sent: Thursday, June 11, 2015 5:57 AM
>> To: Dan van der Ster
>> Cc: ceph-users@xxxxxxxxxxxxxx
>> Subject: Re:  Restarting OSD leads to lower CPU usage
>> 
>> I have no experience with perf and the package is not installed.
>> I will take a look at it, thanks.
>> 
>> Jan
>> 
>> 
>>> On 11 Jun 2015, at 13:48, Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
>>> 
>>> Hi Jan,
>>> 
>>> Can you get perf top running? It should show you where the OSDs are spinning...
>>> 
>>> Cheers, Dan
>>> 
>>> On Thu, Jun 11, 2015 at 11:21 AM, Jan Schermer <jan@xxxxxxxxxxx> wrote:
>>>> Hi,
>>>> hoping someone can point me in the right direction.
>>>> 
>>>> Some of my OSDs have a larger CPU usage (and ops latencies) than others. If I restart the OSD everything runs nicely for some time, then it creeps up.
>>>> 
>>>> 1) most of my OSDs have ~40% CPU (core) usage (user+sys), some are closer to 80%. Restarting means the offending OSDs only use 40% again.
>>>> 2) average latencies and CPU usage on the host are the same - so 
>>>> it’s not caused by the host that the OSD is running on
>>>> 3) I can’t say exactly when or how the issue happens. I can’t even say if it’s the same OSDs. It seems it either happens when something heavy happens in a cluster (like dropping very old snapshots, rebalancing) and then doesn’t come back, or maybe it happens slowly over time and I can’t find it in the graphs. Looking at the graphs it seems to be the former.
>>>> 
>>>> I have just one suspicion and that is the “fd cache size” - we have 
>>>> it set to 16384 but the open fds suggest there are more open files for the osd process (over 17K fds) - it varies by some hundreds between the osds. Maybe some are just slightly over the limit and the misses cause this? Restarting the OSD clears them (~2K) and they increase over time. I increased it to 32768 yesterday and it consistently nice now, but it might take another few days to manifest… Could this explain it? Any other tips?
>>>> 
>>>> Thanks
>>>> 
>>>> Jan
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>> ________________________________
>> 
>> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>> 
> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com