Re: 答复: 答复: mon is stuck in leveldb and costs nearly 100% cpu

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> 2 active+clean+scrubbing+deep

 * Set noscrub and nodeep-scrub
  # ceph osd set noscrub
  # ceph osd set nodeep-scrub

 * Wait for scrubbing+deep to complete

 * Do `ceph -s`

If still you would be seeing high CPU usage, please identify who
is/are eating CPU resource.

 * ps aux | sort -rk 3,4 | head -n 20

And let us know.


On Mon, Feb 13, 2017 at 9:39 PM, Chenyehua <chen.yehua@xxxxxxx> wrote:
> Thanks for the response, Shinobu
> The warning disappears due to your suggesting solution, however the nearly 100% cpu cost still exists and concerns me a lot.
> So, do you know why the cpu cost is so high?
> Are there any solutions or suggestions to this problem?
>
> Cheers
>
> -----邮件原件-----
> 发件人: Shinobu Kinjo [mailto:skinjo@xxxxxxxxxx]
> 发送时间: 2017年2月13日 10:54
> 收件人: chenyehua 11692 (RD)
> 抄送: kchai@xxxxxxxxxx; ceph-users@xxxxxxxxxxxxxx
> 主题: Re: 答复:  mon is stuck in leveldb and costs nearly 100% cpu
>
> O.k, that's reasonable answer. Would you do on all hosts which the MON are running on:
>
>  #* ceph --admin-daemon /var/run/ceph/ceph-mon.`hostname -s`.asok config show | grep leveldb_log
>
> Anyway you can compact leveldb size with at runtime:
>
>  #* ceph tell mon.`hostname -s` compact
>
> And you should set in ceph.conf to prevent same issue from the next:
>
>  #* [mon]
>  #* mon compact on start = true
>
>
> On Mon, Feb 13, 2017 at 11:37 AM, Chenyehua <chen.yehua@xxxxxxx> wrote:
>> Sorry, I made a mistake, the ceph version is actually 0.94.5
>>
>> -----邮件原件-----
>> 发件人: chenyehua 11692 (RD)
>> 发送时间: 2017年2月13日 9:40
>> 收件人: 'Shinobu Kinjo'
>> 抄送: kchai@xxxxxxxxxx; ceph-users@xxxxxxxxxxxxxx
>> 主题: 答复:  mon is stuck in leveldb and costs nearly 100% cpu
>>
>> My ceph version is 10.2.5
>>
>> -----邮件原件-----
>> 发件人: Shinobu Kinjo [mailto:skinjo@xxxxxxxxxx]
>> 发送时间: 2017年2月12日 13:12
>> 收件人: chenyehua 11692 (RD)
>> 抄送: kchai@xxxxxxxxxx; ceph-users@xxxxxxxxxxxxxx
>> 主题: Re:  mon is stuck in leveldb and costs nearly 100% cpu
>>
>> Which Ceph version are you using?
>>
>> On Sat, Feb 11, 2017 at 5:02 PM, Chenyehua <chen.yehua@xxxxxxx> wrote:
>>> Dear Mr Kefu Chai
>>>
>>> Sorry to disturb you.
>>>
>>> I meet a problem recently. In my ceph cluster ,health status has
>>> warning “store is getting too big!” for several days; and  ceph-mon
>>> costs nearly 100% cpu;
>>>
>>> Have you ever met this situation?
>>>
>>> Some detailed information are attached below:
>>>
>>>
>>>
>>> root@cvknode17:~# ceph -s
>>>
>>>     cluster 04afba60-3a77-496c-b616-2ecb5e47e141
>>>
>>>      health HEALTH_WARN
>>>
>>>             mon.cvknode17 store is getting too big! 34104 MB >= 15360
>>> MB
>>>
>>>      monmap e1: 3 mons at
>>> {cvknode15=172.16.51.15:6789/0,cvknode16=172.16.51.16:6789/0,cvknode1
>>> 7
>>> =172.16.51.17:6789/0}
>>>
>>>             election epoch 862, quorum 0,1,2
>>> cvknode15,cvknode16,cvknode17
>>>
>>>      osdmap e196279: 347 osds: 347 up, 347 in
>>>
>>>       pgmap v5891025: 33272 pgs, 16 pools, 26944 GB data, 6822
>>> kobjects
>>>
>>>             65966 GB used, 579 TB / 644 TB avail
>>>
>>>                33270 active+clean
>>>
>>>                    2 active+clean+scrubbing+deep
>>>
>>>   client io 840 kB/s rd, 739 kB/s wr, 35 op/s rd, 184 op/s wr
>>>
>>>
>>>
>>> root@cvknode17:~# top
>>>
>>> top - 15:19:28 up 23 days, 23:58,  6 users,  load average: 1.08,
>>> 1.40,
>>> 1.77
>>>
>>> Tasks: 346 total,   2 running, 342 sleeping,   0 stopped,   2 zombie
>>>
>>> Cpu(s):  8.1%us, 10.8%sy,  0.0%ni, 69.0%id,  9.5%wa,  0.0%hi,
>>> 2.5%si, 0.0%st
>>>
>>> Mem:  65384424k total, 58102880k used,  7281544k free,   240720k buffers
>>>
>>> Swap: 29999100k total,   344944k used, 29654156k free, 24274272k cached
>>>
>>>
>>>
>>>     PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>>
>>>   24407 root      20   0 17.3g  12g  10m S   98 20.2   8420:11 ceph-mon
>>>
>>>
>>>
>>> root@cvknode17:~# top -Hp 24407
>>>
>>> top - 15:19:49 up 23 days, 23:59,  6 users,  load average: 1.12,
>>> 1.39,
>>> 1.76
>>>
>>> Tasks:  17 total,   1 running,  16 sleeping,   0 stopped,   0 zombie
>>>
>>> Cpu(s):  8.1%us, 10.8%sy,  0.0%ni, 69.0%id,  9.5%wa,  0.0%hi,
>>> 2.5%si, 0.0%st
>>>
>>> Mem:  65384424k total, 58104868k used,  7279556k free,   240744k buffers
>>>
>>> Swap: 29999100k total,   344944k used, 29654156k free, 24271188k cached
>>>
>>>
>>>
>>>     PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>>
>>>   25931 root      20   0 17.3g  12g   9m R   98 20.2   7957:37 ceph-mon
>>>
>>>   24514 root      20   0 17.3g  12g   9m S    2 20.2   3:06.75 ceph-mon
>>>
>>>   25932 root      20   0 17.3g  12g   9m S    2 20.2   1:07.82 ceph-mon
>>>
>>>   24407 root      20   0 17.3g  12g   9m S    0 20.2   0:00.67 ceph-mon
>>>
>>>   24508 root      20   0 17.3g  12g   9m S    0 20.2  15:50.24 ceph-mon
>>>
>>>   24513 root      20   0 17.3g  12g   9m S    0 20.2   0:07.88 ceph-mon
>>>
>>>   24534 root      20   0 17.3g  12g   9m S    0 20.2 196:33.85 ceph-mon
>>>
>>>   24535 root      20   0 17.3g  12g   9m S    0 20.2   0:00.01 ceph-mon
>>>
>>>   25929 root      20   0 17.3g  12g   9m S    0 20.2   3:06.09 ceph-mon
>>>
>>>   25930 root      20   0 17.3g  12g   9m S    0 20.2   8:12.58 ceph-mon
>>>
>>>   25933 root      20   0 17.3g  12g   9m S    0 20.2   4:42.22 ceph-mon
>>>
>>>   25934 root      20   0 17.3g  12g   9m S    0 20.2  40:53.27 ceph-mon
>>>
>>>   25935 root      20   0 17.3g  12g   9m S    0 20.2   0:04.84 ceph-mon
>>>
>>>   25936 root      20   0 17.3g  12g   9m S    0 20.2   0:00.01 ceph-mon
>>>
>>>   25980 root      20   0 17.3g  12g   9m S    0 20.2   0:06.65 ceph-mon
>>>
>>>   25986 root      20   0 17.3g  12g   9m S    0 20.2  48:26.77 ceph-mon
>>>
>>>   55738 root      20   0 17.3g  12g   9m S    0 20.2   0:09.06 ceph-mon
>>>
>>>
>>>
>>>
>>>
>>> Thread 20 (Thread 0x7f3e77e80700 (LWP 25931)):
>>>
>>> #0  0x00007f3e7e83a653 in pread64 () from
>>> /lib/x86_64-linux-gnu/libpthread.so.0
>>>
>>> #1  0x00000000009286cf in ?? ()
>>>
>>> #2  0x000000000092c187 in
>>> leveldb::ReadBlock(leveldb::RandomAccessFile*,
>>> leveldb::ReadOptions const&, leveldb::BlockHandle const&,
>>> leveldb::Block**)
>>> ()
>>>
>>> #3  0x0000000000922f41 in leveldb::Table::BlockReader(void*,
>>> leveldb::ReadOptions const&, leveldb::Slice const&) ()
>>>
>>> #4  0x0000000000924840 in ?? ()
>>>
>>> #5  0x0000000000924b39 in ?? ()
>>>
>>> #6  0x0000000000924a7a in ?? ()
>>>
>>> #7  0x00000000009227d0 in ?? ()
>>>
>>> #8  0x00000000009140b6 in ?? ()
>>>
>>> #9  0x00000000009143dd in ?? ()
>>>
>>> #10 0x000000000088d399 in
>>> LevelDBStore::LevelDBWholeSpaceIteratorImpl::lower_bound(std::string
>>> const&, std::string const&) ()
>>>
>>> #11 0x000000000088bf00 in LevelDBStore::get(std::string const&,
>>> std::set<std::string, std::less<std::string>,
>>> std::allocator<std::string> > const&, std::map<std::string,
>>> ceph::buffer::list, std::less<std::string>,
>>> std::allocator<std::pair<std::string const, ceph::buffer::list> > >*)
>>> ()
>>>
>>> #12 0x000000000056a7a2 in MonitorDBStore::get(std::string const&,
>>> std::string const&) ()
>>>
>>> ---Type <return> to continue, or q <return> to quit---
>>>
>>> #13 0x00000000005dcf61 in PaxosService::refresh(bool*) ()
>>>
>>> #14 0x000000000058a76b in Monitor::refresh_from_paxos(bool*) ()
>>>
>>> #15 0x00000000005c55ac in Paxos::do_refresh() ()
>>>
>>> #16 0x00000000005cc093 in Paxos::handle_commit(MMonPaxos*) ()
>>>
>>> #17 0x00000000005d4d8b in Paxos::dispatch(PaxosServiceMessage*) ()
>>>
>>> #18 0x00000000005ac204 in Monitor::dispatch(MonSession*, Message*,
>>> bool) ()
>>>
>>> #19 0x00000000005a9b09 in Monitor::_ms_dispatch(Message*) ()
>>>
>>> #20 0x00000000005c48a2 in Monitor::ms_dispatch(Message*) ()
>>>
>>> #21 0x00000000008b2e67 in Messenger::ms_deliver_dispatch(Message*) ()
>>>
>>> #22 0x00000000008b000a in DispatchQueue::entry() ()
>>>
>>> #23 0x00000000007a069d in DispatchQueue::DispatchThread::entry() ()
>>>
>>> #24 0x00007f3e7e832e9a in start_thread () from
>>> /lib/x86_64-linux-gnu/libpthread.so.0
>>>
>>> #25 0x00007f3e7cff638d in clone () from
>>> /lib/x86_64-linux-gnu/libc.so.6
>>>
>>> #26 0x0000000000000000 in ?? ()
>>>
>>>
>>>
>>>
>>>
>>> Thanks
>>>
>>> Best regards
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> -
>>> ---------------------------------------------------------------
>>> 本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出
>>> 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
>>> 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
>>> 邮件!
>>> This e-mail and its attachments contain confidential information from
>>> H3C, which is intended only for the person or entity whose address is
>>> listed above. Any use of the information contained herein in any way
>>> (including, but not limited to, total or partial disclosure,
>>> reproduction, or dissemination) by persons other than the intended
>>> recipient(s) is prohibited. If you receive this e-mail in error,
>>> please notify the sender by phone or email immediately and delete it!
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux