Re: Significant slowdown of osds since v0.67 Dumpling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sam and Oliver,

We've had tons of issues with Dumpling rbd volumes showing sporadic periods of high latency for Windows guests doing lots of small writes. We saw the issue occasionally with Cuttlefish, but it got significantly worse with Dumpling. Initial results with wip-dumpling-perf2 appear very promising.

Thanks for your work! I'll report back tomorrow if I have any new results.

Thanks,

Mike Dawson
Co-Founder & Director of Cloud Architecture
Cloudapt LLC
6330 East 75th Street, Suite 170
Indianapolis, IN 46250

On 8/29/2013 2:52 PM, Oliver Daudey wrote:
Hey Mark and list,

FYI for you and the list: Samuel and I seem to have found and fixed the
remaining performance-problems.  For those who can't wait, fixes are in
"wip-dumpling-perf2" and will probably be in the next point-release.


    Regards,

      Oliver

On 27-08-13 17:13, Mark Nelson wrote:
Ok, definitely let us know how it goes!  For what it's worth, I'm
testing Sam's wip-dumpling-perf branch with the wbthrottle code disabled
now and comparing it both to that same branch with it enabled along with
0.67.1.  Don't have any perf data, but quite a bit of other data to look
through, both in terms of RADOS bench and RBD.

Mark

On 08/27/2013 10:07 AM, Oliver Daudey wrote:
Hey Mark,

That will take a day or so for me to know with enough certainty.  With
the low CPU-usage and preliminary results today, I'm confident enough to
upgrade all OSDs in production and test the cluster all-Dumpling
tomorrow.  For now, I only upgraded a single OSD and measured CPU-usage
and whatever performance-effects that had on the cluster, so if I would
lose that OSD, I could recover. :-)  Will get back to you.


     Regards,

         Oliver

On 27-08-13 15:04, Mark Nelson wrote:
Hi Olver/Matthew,

Ignoring CPU usage, has speed remained slower as well?

Mark

On 08/27/2013 03:08 AM, Oliver Daudey wrote:
Hey Samuel,

The "PGLog::check()" is now no longer visible in profiling, so it
helped
for that.  Unfortunately, it doesn't seem to have helped to bring down
the OSD's CPU-loading much.  Leveldb still uses much more than in
Cuttlefish.  On my test-cluster, I didn't notice any difference in the
RBD bench-results, either, so I have to assume that it didn't help
performance much.

Here's the `perf top' I took just now on my production-cluster with
your
new version, under regular load.  Also note the "memcmp" and "memcpy",
which also don't show up when running a Cuttlefish-OSD:
    15.65%  [kernel]                     [k]
intel_idle
     7.20%  libleveldb.so.1.9            [.]
0x3ceae
     6.28%  libc-2.11.3.so               [.]
memcmp
     5.22%  [kernel]                     [k]
find_busiest_group
     3.92%  kvm                          [.]
0x2cf006
     2.40%  libleveldb.so.1.9            [.]
leveldb::InternalKeyComparator::Compar
     1.95%  [kernel]                     [k]
_raw_spin_lock
     1.69%  [kernel]                     [k]
default_send_IPI_mask_sequence_phys
     1.46%  libc-2.11.3.so               [.]
memcpy
     1.17%  libleveldb.so.1.9            [.]
leveldb::Block::Iter::Next()
     1.16%  [kernel]                     [k]
hrtimer_interrupt
     1.07%  [kernel]                     [k]
native_write_cr0
     1.01%  [kernel]                     [k]
__hrtimer_start_range_ns
     1.00%  [kernel]                     [k]
clockevents_program_event
     0.93%  [kernel]                     [k]
find_next_bit
     0.93%  libstdc++.so.6.0.13          [.]
std::string::_M_mutate(unsigned long,
     0.89%  [kernel]                     [k]
cpumask_next_and
     0.87%  [kernel]                     [k]
__schedule
     0.85%  [kernel]                     [k]
_raw_spin_unlock_irqrestore
     0.85%  [kernel]                     [k]
do_select
     0.84%  [kernel]                     [k]
apic_timer_interrupt
     0.80%  [kernel]                     [k]
fget_light
     0.79%  [kernel]                     [k]
native_write_msr_safe
     0.76%  [kernel]                     [k]
_raw_spin_lock_irqsave
     0.66%  libc-2.11.3.so               [.]
0xdc6d8
     0.61%  libpthread-2.11.3.so         [.]
pthread_mutex_lock
     0.61%  [kernel]                     [k]
tg_load_down
     0.59%  [kernel]                     [k]
reschedule_interrupt
     0.59%  libsnappy.so.1.1.2           [.]
snappy::RawUncompress(snappy::Source*,
     0.56%  libstdc++.so.6.0.13          [.] std::string::append(char
const*, unsig
     0.54%  [kvm_intel]                  [k]
vmx_vcpu_run
     0.53%  [kernel]                     [k]
copy_user_generic_string
     0.53%  [kernel]                     [k]
load_balance
     0.50%  [kernel]                     [k]
rcu_needs_cpu
     0.45%  [kernel]                     [k] fput


      Regards,

        Oliver

On ma, 2013-08-26 at 23:33 -0700, Samuel Just wrote:
I just pushed a patch to wip-dumpling-log-assert (based on current
dumpling head).  I had disabled most of the code in PGLog::check() but
left an (I thought) innocuous assert.  It seems that with (at least)
g++ 4.6.3, stl list::size() is linear in the size of the list, so that
assert actually traverses the pg log on each operation.  The patch in
wip-dumpling-log-assert should disable that assert as well by default.
    Let me know if it helps.

It should be built within an hour of this email.
-Sam

On Mon, Aug 26, 2013 at 10:46 PM, Matthew Anderson
<manderson8787@xxxxxxxxx> wrote:
Hi Guys,

I'm having the same problem as Oliver with 0.67.2. CPU usage is
around
double that of the 0.61.8 OSD's in the same cluster which appears to
be causing the performance decrease.

I did a perf comparison (not sure if I did it right but it seems ok).
Both hosts are the same spec running Ubuntu 12.04.1 (3.2 kernel),
journal and osd data is on an SSD, OSD's are in the same pool with
the
same weight and the perf tests were run at the same time on a
realworld load consisting of RBD traffic only.

Dumpling -

Events: 332K cycles
    17.93%  ceph-osd  libc-2.15.so           [.] 0x15d523
    17.03%  ceph-osd  ceph-osd               [.] 0x5c2897
     4.66%  ceph-osd  ceph-osd               [.]
leveldb::InternalKeyComparator::Compare(leveldb::Slice const&, level
     3.46%  ceph-osd  ceph-osd               [.]
leveldb::Block::Iter::Next()
     2.70%  ceph-osd  libstdc++.so.6.0.16    [.]
std::string::_M_mutate(unsigned long, unsigned long, unsigned long)
     2.60%  ceph-osd  ceph-osd               [.] PGLog::check()
     2.57%  ceph-osd  [kernel.kallsyms]      [k] __ticket_spin_lock
     2.49%  ceph-osd  ceph-osd               [.] ceph_crc32c_le_intel
     1.93%  ceph-osd  libsnappy.so.1.1.2     [.]
snappy::RawUncompress(snappy::Source*, char*)
     1.53%  ceph-osd  libstdc++.so.6.0.16    [.]
std::string::append(char
const*, unsigned long)
     1.47%  ceph-osd  libtcmalloc.so.0.1.0   [.] operator new(unsigned
long)
     1.33%  ceph-osd  [kernel.kallsyms]      [k]
copy_user_generic_string
     0.98%  ceph-osd  libtcmalloc.so.0.1.0   [.] operator
delete(void*)
     0.90%  ceph-osd  libstdc++.so.6.0.16    [.]
std::string::assign(char
const*, unsigned long)
     0.75%  ceph-osd  libstdc++.so.6.0.16    [.]
std::string::_M_replace_safe(unsigned long, unsigned long, char cons
     0.58%  ceph-osd  [kernel.kallsyms]      [k] wait_sb_inodes
     0.55%  ceph-osd  ceph-osd               [.]
leveldb::Block::Iter::Valid() const
     0.51%  ceph-osd  libtcmalloc.so.0.1.0   [.]
tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::
     0.50%  ceph-osd  libtcmalloc.so.0.1.0   [.]
tcmalloc::CentralFreeList::FetchFromSpans()
     0.47%  ceph-osd  libstdc++.so.6.0.16    [.] 0x9ebc8
     0.46%  ceph-osd  libc-2.15.so           [.] vfprintf
     0.45%  ceph-osd  [kernel.kallsyms]      [k] find_busiest_group
     0.45%  ceph-osd  libstdc++.so.6.0.16    [.]
std::string::resize(unsigned long, char)
     0.43%  ceph-osd  libpthread-2.15.so     [.] pthread_mutex_unlock
     0.41%  ceph-osd  [kernel.kallsyms]      [k] iput_final
     0.40%  ceph-osd  ceph-osd               [.]
leveldb::Block::Iter::Seek(leveldb::Slice const&)
     0.39%  ceph-osd  libc-2.15.so           [.] _IO_vfscanf
     0.39%  ceph-osd  ceph-osd               [.]
leveldb::Block::Iter::key() const
     0.39%  ceph-osd  libtcmalloc.so.0.1.0   [.]
tcmalloc::CentralFreeList::ReleaseToSpans(void*)
     0.37%  ceph-osd  libstdc++.so.6.0.16    [.]
std::basic_ostream<char,
std::char_traits<char> >& std::__ostream_in


Cuttlefish -

Events: 160K cycles
     7.53%  ceph-osd  [kernel.kallsyms]      [k] __ticket_spin_lock
     6.26%  ceph-osd  libc-2.15.so           [.] 0x89115
     3.06%  ceph-osd  ceph-osd               [.] ceph_crc32c_le
     2.66%  ceph-osd  libtcmalloc.so.0.1.0   [.] operator new(unsigned
long)
     2.46%  ceph-osd  [kernel.kallsyms]      [k] find_busiest_group
     1.80%  ceph-osd  libtcmalloc.so.0.1.0   [.] operator
delete(void*)
     1.42%  ceph-osd  [kernel.kallsyms]      [k] try_to_wake_up
     1.27%  ceph-osd  ceph-osd               [.] 0x531fb6
     1.21%  ceph-osd  libstdc++.so.6.0.16    [.] 0x9ebc8
     1.14%  ceph-osd  [kernel.kallsyms]      [k] wait_sb_inodes
     1.02%  ceph-osd  libc-2.15.so           [.] _IO_vfscanf
     1.01%  ceph-osd  [kernel.kallsyms]      [k] update_shares
     0.98%  ceph-osd  [kernel.kallsyms]      [k]
filemap_fdatawait_range
     0.90%  ceph-osd  libstdc++.so.6.0.16    [.]
std::basic_ostream<char,
std::char_traits<char> >& std
     0.89%  ceph-osd  [kernel.kallsyms]      [k] iput_final
     0.79%  ceph-osd  libstdc++.so.6.0.16    [.]
std::basic_string<char,
std::char_traits<char>, std::a
     0.79%  ceph-osd  [kernel.kallsyms]      [k]
copy_user_generic_string
     0.78%  ceph-osd  libc-2.15.so           [.] vfprintf
     0.70%  ceph-osd  libtcmalloc.so.0.1.0   [.]
tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc:
     0.69%  ceph-osd  [kernel.kallsyms]      [k] __d_lookup_rcu
     0.69%  ceph-osd  libtcmalloc.so.0.1.0   [.]
tcmalloc::CentralFreeList::FetchFromSpans()
     0.66%  ceph-osd  [kernel.kallsyms]      [k] igrab
     0.63%  ceph-osd  [kernel.kallsyms]      [k] update_cfs_load
     0.63%  ceph-osd  [kernel.kallsyms]      [k] link_path_walk

If you'd like some more tests run just let me know, more than happy
to help

Thanks
-Matt



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux