mon is stuck in leveldb and costs nearly 100% cpu

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear Mr Kefu Chai

Sorry to disturb you.

I meet a problem recently. In my ceph cluster ,health status has warning “store is getting too big!” for several days; and  ceph-mon costs nearly 100% cpu;

Have you ever met this situation?

Some detailed information are attached below:

 

root@cvknode17:~# ceph -s

    cluster 04afba60-3a77-496c-b616-2ecb5e47e141

     health HEALTH_WARN

            mon.cvknode17 store is getting too big! 34104 MB >= 15360 MB

     monmap e1: 3 mons at {cvknode15=172.16.51.15:6789/0,cvknode16=172.16.51.16:6789/0,cvknode17=172.16.51.17:6789/0}

            election epoch 862, quorum 0,1,2 cvknode15,cvknode16,cvknode17

     osdmap e196279: 347 osds: 347 up, 347 in

      pgmap v5891025: 33272 pgs, 16 pools, 26944 GB data, 6822 kobjects

            65966 GB used, 579 TB / 644 TB avail

               33270 active+clean

                   2 active+clean+scrubbing+deep

  client io 840 kB/s rd, 739 kB/s wr, 35 op/s rd, 184 op/s wr

 

root@cvknode17:~# top

top - 15:19:28 up 23 days, 23:58,  6 users,  load average: 1.08, 1.40, 1.77

Tasks: 346 total,   2 running, 342 sleeping,   0 stopped,   2 zombie

Cpu(s):  8.1%us, 10.8%sy,  0.0%ni, 69.0%id,  9.5%wa,  0.0%hi,  2.5%si,  0.0%st

Mem:  65384424k total, 58102880k used,  7281544k free,   240720k buffers

Swap: 29999100k total,   344944k used, 29654156k free, 24274272k cached

 

    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                     

  24407 root      20   0 17.3g  12g  10m S   98 20.2   8420:11 ceph-mon

 

root@cvknode17:~# top -Hp 24407

top - 15:19:49 up 23 days, 23:59,  6 users,  load average: 1.12, 1.39, 1.76

Tasks:  17 total,   1 running,  16 sleeping,   0 stopped,   0 zombie

Cpu(s):  8.1%us, 10.8%sy,  0.0%ni, 69.0%id,  9.5%wa,  0.0%hi,  2.5%si,  0.0%st

Mem:  65384424k total, 58104868k used,  7279556k free,   240744k buffers

Swap: 29999100k total,   344944k used, 29654156k free, 24271188k cached

 

    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                      

  25931 root      20   0 17.3g  12g   9m R   98 20.2   7957:37 ceph-mon                                                                     

  24514 root      20   0 17.3g  12g   9m S    2 20.2   3:06.75 ceph-mon                                                                     

  25932 root      20   0 17.3g  12g   9m S    2 20.2   1:07.82 ceph-mon                                                                     

  24407 root      20   0 17.3g  12g   9m S    0 20.2   0:00.67 ceph-mon                                                                     

  24508 root      20   0 17.3g  12g   9m S    0 20.2  15:50.24 ceph-mon                                                                      

  24513 root      20   0 17.3g  12g   9m S    0 20.2   0:07.88 ceph-mon                                                                     

  24534 root      20   0 17.3g  12g   9m S    0 20.2 196:33.85 ceph-mon                                                                      

  24535 root      20   0 17.3g  12g   9m S    0 20.2   0:00.01 ceph-mon                                                                     

  25929 root      20   0 17.3g  12g   9m S    0 20.2   3:06.09 ceph-mon                                                                      

  25930 root      20   0 17.3g  12g   9m S    0 20.2   8:12.58 ceph-mon                                                                     

  25933 root      20   0 17.3g  12g   9m S    0 20.2   4:42.22 ceph-mon                                                                     

  25934 root      20   0 17.3g  12g   9m S    0 20.2  40:53.27 ceph-mon                                                                     

  25935 root      20   0 17.3g  12g   9m S    0 20.2   0:04.84 ceph-mon                                                                     

  25936 root      20   0 17.3g  12g   9m S    0 20.2   0:00.01 ceph-mon                                                                     

  25980 root      20   0 17.3g  12g   9m S    0 20.2   0:06.65 ceph-mon                                                                     

  25986 root      20   0 17.3g  12g   9m S    0 20.2  48:26.77 ceph-mon                                                                     

  55738 root      20   0 17.3g  12g   9m S    0 20.2   0:09.06 ceph-mon

 

 

Thread 20 (Thread 0x7f3e77e80700 (LWP 25931)):

#0  0x00007f3e7e83a653 in pread64 () from /lib/x86_64-linux-gnu/libpthread.so.0

#1  0x00000000009286cf in ?? ()

#2  0x000000000092c187 in leveldb::ReadBlock(leveldb::RandomAccessFile*, leveldb::ReadOptions const&, leveldb::BlockHandle const&, leveldb::Block**) ()

#3  0x0000000000922f41 in leveldb::Table::BlockReader(void*, leveldb::ReadOptions const&, leveldb::Slice const&) ()

#4  0x0000000000924840 in ?? ()

#5  0x0000000000924b39 in ?? ()

#6  0x0000000000924a7a in ?? ()

#7  0x00000000009227d0 in ?? ()

#8  0x00000000009140b6 in ?? ()

#9  0x00000000009143dd in ?? ()

#10 0x000000000088d399 in LevelDBStore::LevelDBWholeSpaceIteratorImpl::lower_bound(std::string const&, std::string const&) ()

#11 0x000000000088bf00 in LevelDBStore::get(std::string const&, std::set<std::string, std::less<std::string>, std::allocator<std::string> > const&, std::map<std::string, ceph::buffer::list, std::less<std::string>, std::allocator<std::pair<std::string const, ceph::buffer::list> > >*) ()

#12 0x000000000056a7a2 in MonitorDBStore::get(std::string const&, std::string const&) ()

---Type <return> to continue, or q <return> to quit---

#13 0x00000000005dcf61 in PaxosService::refresh(bool*) ()

#14 0x000000000058a76b in Monitor::refresh_from_paxos(bool*) ()

#15 0x00000000005c55ac in Paxos::do_refresh() ()

#16 0x00000000005cc093 in Paxos::handle_commit(MMonPaxos*) ()

#17 0x00000000005d4d8b in Paxos::dispatch(PaxosServiceMessage*) ()

#18 0x00000000005ac204 in Monitor::dispatch(MonSession*, Message*, bool) ()

#19 0x00000000005a9b09 in Monitor::_ms_dispatch(Message*) ()

#20 0x00000000005c48a2 in Monitor::ms_dispatch(Message*) ()

#21 0x00000000008b2e67 in Messenger::ms_deliver_dispatch(Message*) ()

#22 0x00000000008b000a in DispatchQueue::entry() ()

#23 0x00000000007a069d in DispatchQueue::DispatchThread::entry() ()

#24 0x00007f3e7e832e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0

#25 0x00007f3e7cff638d in clone () from /lib/x86_64-linux-gnu/libc.so.6

#26 0x0000000000000000 in ?? ()

 

 

Thanks

Best regards

 

-------------------------------------------------------------------------------------------------------------------------------------
本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出
的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
邮件!
This e-mail and its attachments contain confidential information from H3C, which is
intended only for the person or entity whose address is listed above. Any use of the
information contained herein in any way (including, but not limited to, total or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
by phone or email immediately and delete it!
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux