Re: Restarting OSD leads to lower CPU usage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On 11 Jun 2015, at 11:53, Henrik Korkuc <lists@xxxxxxxxx> wrote:
> 
> On 6/11/15 12:21, Jan Schermer wrote:
>> Hi,
>> hoping someone can point me in the right direction.
>> 
>> Some of my OSDs have a larger CPU usage (and ops latencies) than others. If I restart the OSD everything runs nicely for some time, then it creeps up.
>> 
>> 1) most of my OSDs have ~40% CPU (core) usage (user+sys), some are closer to 80%. Restarting means the offending OSDs only use 40% again.
>> 2) average latencies and CPU usage on the host are the same - so it’s not caused by the host that the OSD is running on
>> 3) I can’t say exactly when or how the issue happens. I can’t even say if it’s the same OSDs. It seems it either happens when something heavy happens in a cluster (like dropping very old snapshots, rebalancing) and then doesn’t come back, or maybe it happens slowly over time and I can’t find it in the graphs. Looking at the graphs it seems to be the former.
>> 
>> I have just one suspicion and that is the “fd cache size” - we have it set to 16384 but the open fds suggest there are more open files for the osd process (over 17K fds) - it varies by some hundreds between the osds. Maybe some are just slightly over the limit and the misses cause this? Restarting the OSD clears them (~2K) and they increase over time. I increased it to 32768 yesterday and it consistently nice now, but it might take another few days to manifest…
>> Could this explain it? Any other tips?
> What about disk IO? Are OSDs scrubbing or deep-scrubbing?

Nope, the OSDs are not scrubbing or deep-scrubbing, and I see the same amount of ops/sec on the OSD as before the restart. The things that are not yet at the previous (before restart) level are
a) threads: 2200 before restart, 2050 now, slowly going up
b) open files  - changed with fdcache, ~17500 before restart, 31000 now
c) memory usage: rss 1.7GiB x 1.1 now, vss 4.7 x 3.5 now

The amount of work is still the same.

Jan

> 
> 
>> Thanks
>> 
>> Jan
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux