Re: Significant slowdown of osds since v0.67 Dumpling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Oliver,


Could you please have perf generate a call graph?

That would be very interesting as we look deeper into this.

Thanks!
Ian

On 8/27/13 8:55 AM, "Oliver Daudey" <oliver@xxxxxxxxx> wrote:

>Hey Samuel,
>
>These:
>        osd op threads = 8
>        osd disk threads = 2
>        filestore op threads = 8
>
>They increased performance on my test-cluster, but seemed to have the
>opposite effect on the much heavier loaded production-environment.
>
>
>   Regards,
>
>      Oliver
>
>On 27-08-13 16:37, Samuel Just wrote:
>> What options were you using?
>> -Sam
>> 
>> On Tue, Aug 27, 2013 at 7:35 AM, Oliver Daudey <oliver@xxxxxxxxx> wrote:
>>> Hey Ian, Samuel,
>>>
>>> FYI: I still had some attempted optimization-options in place on the
>>> production-cluster, which might have skewed my results a bit.  OSD
>>> version 0.67.2-16-geeb1f86 seems to be a lot less hard on the CPU in
>>>the
>>> configuration that I did all other tests in.  I haven't yet verified
>>> sufficiently if this is accompanied by a speed-increase as well.  On
>>>the
>>> test-cluster, I didn't see any difference in speed, but that may not
>>> mean much, as the load-pattern on production is totally different.
>>> Sorry for that mixup.
>>>
>>> Updated `perf top'-output, extra options removed, under current load,
>>> which should be higher than in my previous mail:
>>>  18.08%  [kernel]                     [k] intel_idle
>>>
>>>   5.87%  [kernel]                     [k] find_busiest_group
>>>
>>>   4.92%  kvm                          [.] 0xcefe2
>>>
>>>   3.24%  [kernel]                     [k] native_write_msr_safe
>>>
>>>   2.92%  [kernel]                     [k]
>>> default_send_IPI_mask_sequence_phys
>>>   2.66%  [kernel]                     [k] _raw_spin_lock
>>>
>>>   1.50%  [kernel]                     [k] native_write_cr0
>>>
>>>   1.36%  libleveldb.so.1.9            [.] 0x3cebc
>>>
>>>   1.27%  [kernel]                     [k] __hrtimer_start_range_ns
>>>
>>>   1.17%  [kernel]                     [k] hrtimer_interrupt
>>>
>>>   1.10%  libc-2.11.3.so               [.] memcmp
>>>
>>>   1.07%  [kernel]                     [k] apic_timer_interrupt
>>>
>>>   1.00%  [kernel]                     [k] find_next_bit
>>>
>>>   0.99%  [kernel]                     [k] cpumask_next_and
>>>
>>>   0.99%  [kernel]                     [k] __schedule
>>>
>>>   0.97%  [kernel]                     [k] clockevents_program_event
>>>
>>>   0.97%  [kernel]                     [k] _raw_spin_unlock_irqrestore
>>>
>>>   0.90%  [kernel]                     [k] fget_light
>>>
>>>   0.85%  [kernel]                     [k] do_select
>>>
>>>   0.84%  [kernel]                     [k] reschedule_interrupt
>>>
>>>   0.83%  [kvm_intel]                  [k] vmx_vcpu_run
>>>
>>>   0.79%  [kernel]                     [k] _raw_spin_lock_irqsave
>>>
>>>   0.78%  [kernel]                     [k] try_to_wake_up
>>>
>>>   0.70%  libc-2.11.3.so               [.] memcpy
>>>
>>>   0.66%  [kernel]                     [k] copy_user_generic_string
>>>
>>>   0.63%  [kernel]                     [k] sync_inodes_sb
>>>
>>>   0.61%  [kernel]                     [k] load_balance
>>>
>>>   0.61%  [kernel]                     [k] tg_load_down
>>>
>>>   0.56%  [kernel]                     [k] irq_entries_start
>>>
>>>   0.56%  libc-2.11.3.so               [.] 0x73612
>>>
>>>   0.54%  libpthread-2.11.3.so         [.] pthread_mutex_lock
>>>
>>>   0.51%  [kernel]                     [k] rcu_needs_cpu
>>>
>>>
>>>    Regards,
>>>
>>>       Oliver
>>>
>>> On 27-08-13 16:04, Ian Colle wrote:
>>>>
>>>> On Aug 27, 2013, at 2:08, Oliver Daudey <oliver@xxxxxxxxx> wrote:
>>>>
>>>>> Hey Samuel,
>>>>>
>>>>> The "PGLog::check()" is now no longer visible in profiling, so it
>>>>>helped
>>>>> for that.  Unfortunately, it doesn't seem to have helped to bring
>>>>>down
>>>>> the OSD's CPU-loading much.  Leveldb still uses much more than in
>>>>> Cuttlefish.  On my test-cluster, I didn't notice any difference in
>>>>>the
>>>>> RBD bench-results, either, so I have to assume that it didn't help
>>>>> performance much.
>>>>>
>>>>> Here's the `perf top' I took just now on my production-cluster with
>>>>>your
>>>>> new version, under regular load.  Also note the "memcmp" and
>>>>>"memcpy",
>>>>> which also don't show up when running a Cuttlefish-OSD:
>>>>
>>>> memcpy is in fact also present in your Cuttlefish OSD, just a bit
>>>>further down the list (increased from .7% to 1.4%).
>>>>
>>>> memcmp definitely looks suspicious and is something we're looking
>>>>into.
>>>>
>>>>> 15.65%  [kernel]                     [k]
>>>>> intel_idle
>>>>>  7.20%  libleveldb.so.1.9            [.]
>>>>> 0x3ceae
>>>>>  6.28%  libc-2.11.3.so               [.]
>>>>> memcmp
>>>>>  5.22%  [kernel]                     [k]
>>>>> find_busiest_group
>>>>>  3.92%  kvm                          [.]
>>>>> 0x2cf006
>>>>>  2.40%  libleveldb.so.1.9            [.]
>>>>> leveldb::InternalKeyComparator::Compar
>>>>>  1.95%  [kernel]                     [k]
>>>>> _raw_spin_lock
>>>>>  1.69%  [kernel]                     [k]
>>>>> default_send_IPI_mask_sequence_phys
>>>>>  1.46%  libc-2.11.3.so               [.]
>>>>> memcpy
>>>>>  1.17%  libleveldb.so.1.9            [.]
>>>>> leveldb::Block::Iter::Next()
>>>>>  1.16%  [kernel]                     [k]
>>>>> hrtimer_interrupt
>>>>>  1.07%  [kernel]                     [k]
>>>>> native_write_cr0
>>>>>  1.01%  [kernel]                     [k]
>>>>> __hrtimer_start_range_ns
>>>>>  1.00%  [kernel]                     [k]
>>>>> clockevents_program_event
>>>>>  0.93%  [kernel]                     [k]
>>>>> find_next_bit
>>>>>  0.93%  libstdc++.so.6.0.13          [.]
>>>>> std::string::_M_mutate(unsigned long,
>>>>>  0.89%  [kernel]                     [k]
>>>>> cpumask_next_and
>>>>>  0.87%  [kernel]                     [k]
>>>>> __schedule
>>>>>  0.85%  [kernel]                     [k]
>>>>> _raw_spin_unlock_irqrestore
>>>>>  0.85%  [kernel]                     [k]
>>>>> do_select
>>>>>  0.84%  [kernel]                     [k]
>>>>> apic_timer_interrupt
>>>>>  0.80%  [kernel]                     [k]
>>>>> fget_light
>>>>>  0.79%  [kernel]                     [k]
>>>>> native_write_msr_safe
>>>>>  0.76%  [kernel]                     [k]
>>>>> _raw_spin_lock_irqsave
>>>>>  0.66%  libc-2.11.3.so               [.]
>>>>> 0xdc6d8
>>>>>  0.61%  libpthread-2.11.3.so         [.]
>>>>> pthread_mutex_lock
>>>>>  0.61%  [kernel]                     [k]
>>>>> tg_load_down
>>>>>  0.59%  [kernel]                     [k]
>>>>> reschedule_interrupt
>>>>>  0.59%  libsnappy.so.1.1.2           [.]
>>>>> snappy::RawUncompress(snappy::Source*,
>>>>>  0.56%  libstdc++.so.6.0.13          [.] std::string::append(char
>>>>> const*, unsig
>>>>>  0.54%  [kvm_intel]                  [k]
>>>>> vmx_vcpu_run
>>>>>  0.53%  [kernel]                     [k]
>>>>> copy_user_generic_string
>>>>>  0.53%  [kernel]                     [k]
>>>>> load_balance
>>>>>  0.50%  [kernel]                     [k]
>>>>> rcu_needs_cpu
>>>>>  0.45%  [kernel]                     [k] fput
>>>>>
>>>>>
>>>>>   Regards,
>>>>>
>>>>>     Oliver
>>>>>
>>>>> On ma, 2013-08-26 at 23:33 -0700, Samuel Just wrote:
>>>>>> I just pushed a patch to wip-dumpling-log-assert (based on current
>>>>>> dumpling head).  I had disabled most of the code in PGLog::check()
>>>>>>but
>>>>>> left an (I thought) innocuous assert.  It seems that with (at least)
>>>>>> g++ 4.6.3, stl list::size() is linear in the size of the list, so
>>>>>>that
>>>>>> assert actually traverses the pg log on each operation.  The patch
>>>>>>in
>>>>>> wip-dumpling-log-assert should disable that assert as well by
>>>>>>default.
>>>>>> Let me know if it helps.
>>>>>>
>>>>>> It should be built within an hour of this email.
>>>>>> -Sam
>>>>>>
>>>>>> On Mon, Aug 26, 2013 at 10:46 PM, Matthew Anderson
>>>>>> <manderson8787@xxxxxxxxx> wrote:
>>>>>>> Hi Guys,
>>>>>>>
>>>>>>> I'm having the same problem as Oliver with 0.67.2. CPU usage is
>>>>>>>around
>>>>>>> double that of the 0.61.8 OSD's in the same cluster which appears
>>>>>>>to
>>>>>>> be causing the performance decrease.
>>>>>>>
>>>>>>> I did a perf comparison (not sure if I did it right but it seems
>>>>>>>ok).
>>>>>>> Both hosts are the same spec running Ubuntu 12.04.1 (3.2 kernel),
>>>>>>> journal and osd data is on an SSD, OSD's are in the same pool with
>>>>>>>the
>>>>>>> same weight and the perf tests were run at the same time on a
>>>>>>> realworld load consisting of RBD traffic only.
>>>>>>>
>>>>>>> Dumpling -
>>>>>>>
>>>>>>> Events: 332K cycles
>>>>>>> 17.93%  ceph-osd  libc-2.15.so           [.] 0x15d523
>>>>>>> 17.03%  ceph-osd  ceph-osd               [.] 0x5c2897
>>>>>>>  4.66%  ceph-osd  ceph-osd               [.]
>>>>>>> leveldb::InternalKeyComparator::Compare(leveldb::Slice const&,
>>>>>>>level
>>>>>>>  3.46%  ceph-osd  ceph-osd               [.]
>>>>>>>leveldb::Block::Iter::Next()
>>>>>>>  2.70%  ceph-osd  libstdc++.so.6.0.16    [.]
>>>>>>> std::string::_M_mutate(unsigned long, unsigned long, unsigned long)
>>>>>>>  2.60%  ceph-osd  ceph-osd               [.] PGLog::check()
>>>>>>>  2.57%  ceph-osd  [kernel.kallsyms]      [k] __ticket_spin_lock
>>>>>>>  2.49%  ceph-osd  ceph-osd               [.] ceph_crc32c_le_intel
>>>>>>>  1.93%  ceph-osd  libsnappy.so.1.1.2     [.]
>>>>>>> snappy::RawUncompress(snappy::Source*, char*)
>>>>>>>  1.53%  ceph-osd  libstdc++.so.6.0.16    [.]
>>>>>>>std::string::append(char
>>>>>>> const*, unsigned long)
>>>>>>>  1.47%  ceph-osd  libtcmalloc.so.0.1.0   [.] operator new(unsigned
>>>>>>>long)
>>>>>>>  1.33%  ceph-osd  [kernel.kallsyms]      [k]
>>>>>>>copy_user_generic_string
>>>>>>>  0.98%  ceph-osd  libtcmalloc.so.0.1.0   [.] operator delete(void*)
>>>>>>>  0.90%  ceph-osd  libstdc++.so.6.0.16    [.]
>>>>>>>std::string::assign(char
>>>>>>> const*, unsigned long)
>>>>>>>  0.75%  ceph-osd  libstdc++.so.6.0.16    [.]
>>>>>>> std::string::_M_replace_safe(unsigned long, unsigned long, char
>>>>>>>cons
>>>>>>>  0.58%  ceph-osd  [kernel.kallsyms]      [k] wait_sb_inodes
>>>>>>>  0.55%  ceph-osd  ceph-osd               [.]
>>>>>>> leveldb::Block::Iter::Valid() const
>>>>>>>  0.51%  ceph-osd  libtcmalloc.so.0.1.0   [.]
>>>>>>> 
>>>>>>>tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::
>>>>>>>  0.50%  ceph-osd  libtcmalloc.so.0.1.0   [.]
>>>>>>> tcmalloc::CentralFreeList::FetchFromSpans()
>>>>>>>  0.47%  ceph-osd  libstdc++.so.6.0.16    [.] 0x9ebc8
>>>>>>>  0.46%  ceph-osd  libc-2.15.so           [.] vfprintf
>>>>>>>  0.45%  ceph-osd  [kernel.kallsyms]      [k] find_busiest_group
>>>>>>>  0.45%  ceph-osd  libstdc++.so.6.0.16    [.]
>>>>>>> std::string::resize(unsigned long, char)
>>>>>>>  0.43%  ceph-osd  libpthread-2.15.so     [.] pthread_mutex_unlock
>>>>>>>  0.41%  ceph-osd  [kernel.kallsyms]      [k] iput_final
>>>>>>>  0.40%  ceph-osd  ceph-osd               [.]
>>>>>>> leveldb::Block::Iter::Seek(leveldb::Slice const&)
>>>>>>>  0.39%  ceph-osd  libc-2.15.so           [.] _IO_vfscanf
>>>>>>>  0.39%  ceph-osd  ceph-osd               [.]
>>>>>>>leveldb::Block::Iter::key() const
>>>>>>>  0.39%  ceph-osd  libtcmalloc.so.0.1.0   [.]
>>>>>>> tcmalloc::CentralFreeList::ReleaseToSpans(void*)
>>>>>>>  0.37%  ceph-osd  libstdc++.so.6.0.16    [.]
>>>>>>>std::basic_ostream<char,
>>>>>>> std::char_traits<char> >& std::__ostream_in
>>>>>>>
>>>>>>>
>>>>>>> Cuttlefish -
>>>>>>>
>>>>>>> Events: 160K cycles
>>>>>>>  7.53%  ceph-osd  [kernel.kallsyms]      [k] __ticket_spin_lock
>>>>>>>  6.26%  ceph-osd  libc-2.15.so           [.] 0x89115
>>>>>>>  3.06%  ceph-osd  ceph-osd               [.] ceph_crc32c_le
>>>>>>>  2.66%  ceph-osd  libtcmalloc.so.0.1.0   [.] operator new(unsigned
>>>>>>>long)
>>>>>>>  2.46%  ceph-osd  [kernel.kallsyms]      [k] find_busiest_group
>>>>>>>  1.80%  ceph-osd  libtcmalloc.so.0.1.0   [.] operator delete(void*)
>>>>>>>  1.42%  ceph-osd  [kernel.kallsyms]      [k] try_to_wake_up
>>>>>>>  1.27%  ceph-osd  ceph-osd               [.] 0x531fb6
>>>>>>>  1.21%  ceph-osd  libstdc++.so.6.0.16    [.] 0x9ebc8
>>>>>>>  1.14%  ceph-osd  [kernel.kallsyms]      [k] wait_sb_inodes
>>>>>>>  1.02%  ceph-osd  libc-2.15.so           [.] _IO_vfscanf
>>>>>>>  1.01%  ceph-osd  [kernel.kallsyms]      [k] update_shares
>>>>>>>  0.98%  ceph-osd  [kernel.kallsyms]      [k]
>>>>>>>filemap_fdatawait_range
>>>>>>>  0.90%  ceph-osd  libstdc++.so.6.0.16    [.]
>>>>>>>std::basic_ostream<char,
>>>>>>> std::char_traits<char> >& std
>>>>>>>  0.89%  ceph-osd  [kernel.kallsyms]      [k] iput_final
>>>>>>>  0.79%  ceph-osd  libstdc++.so.6.0.16    [.]
>>>>>>>std::basic_string<char,
>>>>>>> std::char_traits<char>, std::a
>>>>>>>  0.79%  ceph-osd  [kernel.kallsyms]      [k]
>>>>>>>copy_user_generic_string
>>>>>>>  0.78%  ceph-osd  libc-2.15.so           [.] vfprintf
>>>>>>>  0.70%  ceph-osd  libtcmalloc.so.0.1.0   [.]
>>>>>>> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc:
>>>>>>>  0.69%  ceph-osd  [kernel.kallsyms]      [k] __d_lookup_rcu
>>>>>>>  0.69%  ceph-osd  libtcmalloc.so.0.1.0   [.]
>>>>>>> tcmalloc::CentralFreeList::FetchFromSpans()
>>>>>>>  0.66%  ceph-osd  [kernel.kallsyms]      [k] igrab
>>>>>>>  0.63%  ceph-osd  [kernel.kallsyms]      [k] update_cfs_load
>>>>>>>  0.63%  ceph-osd  [kernel.kallsyms]      [k] link_path_walk
>>>>>>>
>>>>>>> If you'd like some more tests run just let me know, more than
>>>>>>>happy to help
>>>>>>>
>>>>>>> Thanks
>>>>>>> -Matt
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>
>> 
>


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux