Performance drop on Ubuntu 14.04 LTS for 4K/8K workload

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,
I have a two node cluster with 32 OSDs  on each (one per drive). It was working fine till we spot a severe performance degradation for 4K/8K workload.
I saw one node is consuming ~5 times more cpu than other node for serving the same amount of inbound request. This is not related to disks since this is happening for smaller workload serving out of memory as well. Running perf top on both the server reveals the following.

Server A (consuming more cpu):
--------------------------------------

16.06%  [kernel]              [k] read_hpet
  5.85%  [vdso]                [.] 0x0000000000000dd7
  3.62%  [kernel]              [k] _raw_spin_lock
  2.76%  ceph-osd              [.] crush_hash32_3
  1.97%  libtcmalloc.so.4.1.2  [.] operator new(unsigned long)
  1.87%  libc-2.19.so          [.] 0x0000000000161f0b
  1.34%  [kernel]              [k] _raw_spin_lock_irqsave
  1.14%  libtcmalloc.so.4.1.2  [.] operator delete(void*)
  1.06%  ceph-osd              [.] 0x00000000007f3b26
  0.99%  perf                  [.] 0x0000000000056584
  0.96%  libstdc++.so.6.0.19   [.] std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&)
  0.77%  [kernel]              [k] futex_wake
  0.69%  libstdc++.so.6.0.19   [.] 0x000000000005b644


Server B (the good one):
----------------------------

3.47%  ceph-osd              [.] crush_hash32_3
  2.73%  [kernel]              [k] _raw_spin_lock
  2.30%  libtcmalloc.so.4.1.2  [.] operator new(unsigned long)
  2.24%  libc-2.19.so          [.] 0x0000000000098e13
  1.33%  libtcmalloc.so.4.1.2  [.] operator delete(void*)
  1.32%  [kernel]              [k] futex_wake
  1.21%  [kernel]              [k] __schedule
  1.20%  libstdc++.so.6.0.19   [.] std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&)
  1.14%  ceph-osd              [.] 0x00000000007f3e5f
  1.13%  [kernel]              [k] _raw_spin_lock_irqsave
  0.97%  libstdc++.so.6.0.19   [.] 0x000000000005b651
  0.87%  [kernel]              [k] futex_requeue
  0.87%  [kernel]              [k] __copy_user_nocache
  0.80%  perf                  [.] 0x000000000005659e
  0.72%  [kernel]              [k] __d_lookup_rcu
  0.69%  libpthread-2.19.so    [.] pthread_mutex_trylock
  0.68%  [kernel]              [k] futex_wake_op
  0.67%  libstdc++.so.6.0.19   [.] std::string::_Rep::_M_dispose(std::allocator<char> const&)
  0.66%  libc-2.19.so          [.] vfprintf
  0.62%  libtcmalloc.so.4.1.2  [.] tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int)
  0.61%  ceph-osd              [.] Mutex::Lock(bool)
  0.57%  [kernel]              [k] tcp_sendmsg

So, it seems the gettimeofday + VDSO + read_hpet () is the primary reason for more cpu usage. Both the servers are identical and I couldn’t figure out why read_hpet() is consuming way more cpu on Server A. Restarting ceph-services didn’t help. Couldn’t find any abnormal message in syslog either.
My last resort was to reboot and as always that helped ☺…Now, both the nodes are behaving similarly.

Anybody has similar experience ? Am I hitting any Ubuntu (14.04.LTS , 3.13.0-32-generic) bug here ?

Any help/suggestion would be very helpful.

Thanks & Regards
Somnath




________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f





[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux