Oliver, Could you please have perf generate a call graph? That would be very interesting as we look deeper into this. Thanks! Ian On 8/27/13 8:55 AM, "Oliver Daudey" <oliver@xxxxxxxxx> wrote: >Hey Samuel, > >These: > osd op threads = 8 > osd disk threads = 2 > filestore op threads = 8 > >They increased performance on my test-cluster, but seemed to have the >opposite effect on the much heavier loaded production-environment. > > > Regards, > > Oliver > >On 27-08-13 16:37, Samuel Just wrote: >> What options were you using? >> -Sam >> >> On Tue, Aug 27, 2013 at 7:35 AM, Oliver Daudey <oliver@xxxxxxxxx> wrote: >>> Hey Ian, Samuel, >>> >>> FYI: I still had some attempted optimization-options in place on the >>> production-cluster, which might have skewed my results a bit. OSD >>> version 0.67.2-16-geeb1f86 seems to be a lot less hard on the CPU in >>>the >>> configuration that I did all other tests in. I haven't yet verified >>> sufficiently if this is accompanied by a speed-increase as well. On >>>the >>> test-cluster, I didn't see any difference in speed, but that may not >>> mean much, as the load-pattern on production is totally different. >>> Sorry for that mixup. >>> >>> Updated `perf top'-output, extra options removed, under current load, >>> which should be higher than in my previous mail: >>> 18.08% [kernel] [k] intel_idle >>> >>> 5.87% [kernel] [k] find_busiest_group >>> >>> 4.92% kvm [.] 0xcefe2 >>> >>> 3.24% [kernel] [k] native_write_msr_safe >>> >>> 2.92% [kernel] [k] >>> default_send_IPI_mask_sequence_phys >>> 2.66% [kernel] [k] _raw_spin_lock >>> >>> 1.50% [kernel] [k] native_write_cr0 >>> >>> 1.36% libleveldb.so.1.9 [.] 0x3cebc >>> >>> 1.27% [kernel] [k] __hrtimer_start_range_ns >>> >>> 1.17% [kernel] [k] hrtimer_interrupt >>> >>> 1.10% libc-2.11.3.so [.] memcmp >>> >>> 1.07% [kernel] [k] apic_timer_interrupt >>> >>> 1.00% [kernel] [k] find_next_bit >>> >>> 0.99% [kernel] [k] cpumask_next_and >>> >>> 0.99% [kernel] [k] __schedule >>> >>> 0.97% [kernel] [k] clockevents_program_event >>> >>> 0.97% [kernel] [k] _raw_spin_unlock_irqrestore >>> >>> 0.90% [kernel] [k] fget_light >>> >>> 0.85% [kernel] [k] do_select >>> >>> 0.84% [kernel] [k] reschedule_interrupt >>> >>> 0.83% [kvm_intel] [k] vmx_vcpu_run >>> >>> 0.79% [kernel] [k] _raw_spin_lock_irqsave >>> >>> 0.78% [kernel] [k] try_to_wake_up >>> >>> 0.70% libc-2.11.3.so [.] memcpy >>> >>> 0.66% [kernel] [k] copy_user_generic_string >>> >>> 0.63% [kernel] [k] sync_inodes_sb >>> >>> 0.61% [kernel] [k] load_balance >>> >>> 0.61% [kernel] [k] tg_load_down >>> >>> 0.56% [kernel] [k] irq_entries_start >>> >>> 0.56% libc-2.11.3.so [.] 0x73612 >>> >>> 0.54% libpthread-2.11.3.so [.] pthread_mutex_lock >>> >>> 0.51% [kernel] [k] rcu_needs_cpu >>> >>> >>> Regards, >>> >>> Oliver >>> >>> On 27-08-13 16:04, Ian Colle wrote: >>>> >>>> On Aug 27, 2013, at 2:08, Oliver Daudey <oliver@xxxxxxxxx> wrote: >>>> >>>>> Hey Samuel, >>>>> >>>>> The "PGLog::check()" is now no longer visible in profiling, so it >>>>>helped >>>>> for that. Unfortunately, it doesn't seem to have helped to bring >>>>>down >>>>> the OSD's CPU-loading much. Leveldb still uses much more than in >>>>> Cuttlefish. On my test-cluster, I didn't notice any difference in >>>>>the >>>>> RBD bench-results, either, so I have to assume that it didn't help >>>>> performance much. >>>>> >>>>> Here's the `perf top' I took just now on my production-cluster with >>>>>your >>>>> new version, under regular load. Also note the "memcmp" and >>>>>"memcpy", >>>>> which also don't show up when running a Cuttlefish-OSD: >>>> >>>> memcpy is in fact also present in your Cuttlefish OSD, just a bit >>>>further down the list (increased from .7% to 1.4%). >>>> >>>> memcmp definitely looks suspicious and is something we're looking >>>>into. >>>> >>>>> 15.65% [kernel] [k] >>>>> intel_idle >>>>> 7.20% libleveldb.so.1.9 [.] >>>>> 0x3ceae >>>>> 6.28% libc-2.11.3.so [.] >>>>> memcmp >>>>> 5.22% [kernel] [k] >>>>> find_busiest_group >>>>> 3.92% kvm [.] >>>>> 0x2cf006 >>>>> 2.40% libleveldb.so.1.9 [.] >>>>> leveldb::InternalKeyComparator::Compar >>>>> 1.95% [kernel] [k] >>>>> _raw_spin_lock >>>>> 1.69% [kernel] [k] >>>>> default_send_IPI_mask_sequence_phys >>>>> 1.46% libc-2.11.3.so [.] >>>>> memcpy >>>>> 1.17% libleveldb.so.1.9 [.] >>>>> leveldb::Block::Iter::Next() >>>>> 1.16% [kernel] [k] >>>>> hrtimer_interrupt >>>>> 1.07% [kernel] [k] >>>>> native_write_cr0 >>>>> 1.01% [kernel] [k] >>>>> __hrtimer_start_range_ns >>>>> 1.00% [kernel] [k] >>>>> clockevents_program_event >>>>> 0.93% [kernel] [k] >>>>> find_next_bit >>>>> 0.93% libstdc++.so.6.0.13 [.] >>>>> std::string::_M_mutate(unsigned long, >>>>> 0.89% [kernel] [k] >>>>> cpumask_next_and >>>>> 0.87% [kernel] [k] >>>>> __schedule >>>>> 0.85% [kernel] [k] >>>>> _raw_spin_unlock_irqrestore >>>>> 0.85% [kernel] [k] >>>>> do_select >>>>> 0.84% [kernel] [k] >>>>> apic_timer_interrupt >>>>> 0.80% [kernel] [k] >>>>> fget_light >>>>> 0.79% [kernel] [k] >>>>> native_write_msr_safe >>>>> 0.76% [kernel] [k] >>>>> _raw_spin_lock_irqsave >>>>> 0.66% libc-2.11.3.so [.] >>>>> 0xdc6d8 >>>>> 0.61% libpthread-2.11.3.so [.] >>>>> pthread_mutex_lock >>>>> 0.61% [kernel] [k] >>>>> tg_load_down >>>>> 0.59% [kernel] [k] >>>>> reschedule_interrupt >>>>> 0.59% libsnappy.so.1.1.2 [.] >>>>> snappy::RawUncompress(snappy::Source*, >>>>> 0.56% libstdc++.so.6.0.13 [.] std::string::append(char >>>>> const*, unsig >>>>> 0.54% [kvm_intel] [k] >>>>> vmx_vcpu_run >>>>> 0.53% [kernel] [k] >>>>> copy_user_generic_string >>>>> 0.53% [kernel] [k] >>>>> load_balance >>>>> 0.50% [kernel] [k] >>>>> rcu_needs_cpu >>>>> 0.45% [kernel] [k] fput >>>>> >>>>> >>>>> Regards, >>>>> >>>>> Oliver >>>>> >>>>> On ma, 2013-08-26 at 23:33 -0700, Samuel Just wrote: >>>>>> I just pushed a patch to wip-dumpling-log-assert (based on current >>>>>> dumpling head). I had disabled most of the code in PGLog::check() >>>>>>but >>>>>> left an (I thought) innocuous assert. It seems that with (at least) >>>>>> g++ 4.6.3, stl list::size() is linear in the size of the list, so >>>>>>that >>>>>> assert actually traverses the pg log on each operation. The patch >>>>>>in >>>>>> wip-dumpling-log-assert should disable that assert as well by >>>>>>default. >>>>>> Let me know if it helps. >>>>>> >>>>>> It should be built within an hour of this email. >>>>>> -Sam >>>>>> >>>>>> On Mon, Aug 26, 2013 at 10:46 PM, Matthew Anderson >>>>>> <manderson8787@xxxxxxxxx> wrote: >>>>>>> Hi Guys, >>>>>>> >>>>>>> I'm having the same problem as Oliver with 0.67.2. CPU usage is >>>>>>>around >>>>>>> double that of the 0.61.8 OSD's in the same cluster which appears >>>>>>>to >>>>>>> be causing the performance decrease. >>>>>>> >>>>>>> I did a perf comparison (not sure if I did it right but it seems >>>>>>>ok). >>>>>>> Both hosts are the same spec running Ubuntu 12.04.1 (3.2 kernel), >>>>>>> journal and osd data is on an SSD, OSD's are in the same pool with >>>>>>>the >>>>>>> same weight and the perf tests were run at the same time on a >>>>>>> realworld load consisting of RBD traffic only. >>>>>>> >>>>>>> Dumpling - >>>>>>> >>>>>>> Events: 332K cycles >>>>>>> 17.93% ceph-osd libc-2.15.so [.] 0x15d523 >>>>>>> 17.03% ceph-osd ceph-osd [.] 0x5c2897 >>>>>>> 4.66% ceph-osd ceph-osd [.] >>>>>>> leveldb::InternalKeyComparator::Compare(leveldb::Slice const&, >>>>>>>level >>>>>>> 3.46% ceph-osd ceph-osd [.] >>>>>>>leveldb::Block::Iter::Next() >>>>>>> 2.70% ceph-osd libstdc++.so.6.0.16 [.] >>>>>>> std::string::_M_mutate(unsigned long, unsigned long, unsigned long) >>>>>>> 2.60% ceph-osd ceph-osd [.] PGLog::check() >>>>>>> 2.57% ceph-osd [kernel.kallsyms] [k] __ticket_spin_lock >>>>>>> 2.49% ceph-osd ceph-osd [.] ceph_crc32c_le_intel >>>>>>> 1.93% ceph-osd libsnappy.so.1.1.2 [.] >>>>>>> snappy::RawUncompress(snappy::Source*, char*) >>>>>>> 1.53% ceph-osd libstdc++.so.6.0.16 [.] >>>>>>>std::string::append(char >>>>>>> const*, unsigned long) >>>>>>> 1.47% ceph-osd libtcmalloc.so.0.1.0 [.] operator new(unsigned >>>>>>>long) >>>>>>> 1.33% ceph-osd [kernel.kallsyms] [k] >>>>>>>copy_user_generic_string >>>>>>> 0.98% ceph-osd libtcmalloc.so.0.1.0 [.] operator delete(void*) >>>>>>> 0.90% ceph-osd libstdc++.so.6.0.16 [.] >>>>>>>std::string::assign(char >>>>>>> const*, unsigned long) >>>>>>> 0.75% ceph-osd libstdc++.so.6.0.16 [.] >>>>>>> std::string::_M_replace_safe(unsigned long, unsigned long, char >>>>>>>cons >>>>>>> 0.58% ceph-osd [kernel.kallsyms] [k] wait_sb_inodes >>>>>>> 0.55% ceph-osd ceph-osd [.] >>>>>>> leveldb::Block::Iter::Valid() const >>>>>>> 0.51% ceph-osd libtcmalloc.so.0.1.0 [.] >>>>>>> >>>>>>>tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache:: >>>>>>> 0.50% ceph-osd libtcmalloc.so.0.1.0 [.] >>>>>>> tcmalloc::CentralFreeList::FetchFromSpans() >>>>>>> 0.47% ceph-osd libstdc++.so.6.0.16 [.] 0x9ebc8 >>>>>>> 0.46% ceph-osd libc-2.15.so [.] vfprintf >>>>>>> 0.45% ceph-osd [kernel.kallsyms] [k] find_busiest_group >>>>>>> 0.45% ceph-osd libstdc++.so.6.0.16 [.] >>>>>>> std::string::resize(unsigned long, char) >>>>>>> 0.43% ceph-osd libpthread-2.15.so [.] pthread_mutex_unlock >>>>>>> 0.41% ceph-osd [kernel.kallsyms] [k] iput_final >>>>>>> 0.40% ceph-osd ceph-osd [.] >>>>>>> leveldb::Block::Iter::Seek(leveldb::Slice const&) >>>>>>> 0.39% ceph-osd libc-2.15.so [.] _IO_vfscanf >>>>>>> 0.39% ceph-osd ceph-osd [.] >>>>>>>leveldb::Block::Iter::key() const >>>>>>> 0.39% ceph-osd libtcmalloc.so.0.1.0 [.] >>>>>>> tcmalloc::CentralFreeList::ReleaseToSpans(void*) >>>>>>> 0.37% ceph-osd libstdc++.so.6.0.16 [.] >>>>>>>std::basic_ostream<char, >>>>>>> std::char_traits<char> >& std::__ostream_in >>>>>>> >>>>>>> >>>>>>> Cuttlefish - >>>>>>> >>>>>>> Events: 160K cycles >>>>>>> 7.53% ceph-osd [kernel.kallsyms] [k] __ticket_spin_lock >>>>>>> 6.26% ceph-osd libc-2.15.so [.] 0x89115 >>>>>>> 3.06% ceph-osd ceph-osd [.] ceph_crc32c_le >>>>>>> 2.66% ceph-osd libtcmalloc.so.0.1.0 [.] operator new(unsigned >>>>>>>long) >>>>>>> 2.46% ceph-osd [kernel.kallsyms] [k] find_busiest_group >>>>>>> 1.80% ceph-osd libtcmalloc.so.0.1.0 [.] operator delete(void*) >>>>>>> 1.42% ceph-osd [kernel.kallsyms] [k] try_to_wake_up >>>>>>> 1.27% ceph-osd ceph-osd [.] 0x531fb6 >>>>>>> 1.21% ceph-osd libstdc++.so.6.0.16 [.] 0x9ebc8 >>>>>>> 1.14% ceph-osd [kernel.kallsyms] [k] wait_sb_inodes >>>>>>> 1.02% ceph-osd libc-2.15.so [.] _IO_vfscanf >>>>>>> 1.01% ceph-osd [kernel.kallsyms] [k] update_shares >>>>>>> 0.98% ceph-osd [kernel.kallsyms] [k] >>>>>>>filemap_fdatawait_range >>>>>>> 0.90% ceph-osd libstdc++.so.6.0.16 [.] >>>>>>>std::basic_ostream<char, >>>>>>> std::char_traits<char> >& std >>>>>>> 0.89% ceph-osd [kernel.kallsyms] [k] iput_final >>>>>>> 0.79% ceph-osd libstdc++.so.6.0.16 [.] >>>>>>>std::basic_string<char, >>>>>>> std::char_traits<char>, std::a >>>>>>> 0.79% ceph-osd [kernel.kallsyms] [k] >>>>>>>copy_user_generic_string >>>>>>> 0.78% ceph-osd libc-2.15.so [.] vfprintf >>>>>>> 0.70% ceph-osd libtcmalloc.so.0.1.0 [.] >>>>>>> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc: >>>>>>> 0.69% ceph-osd [kernel.kallsyms] [k] __d_lookup_rcu >>>>>>> 0.69% ceph-osd libtcmalloc.so.0.1.0 [.] >>>>>>> tcmalloc::CentralFreeList::FetchFromSpans() >>>>>>> 0.66% ceph-osd [kernel.kallsyms] [k] igrab >>>>>>> 0.63% ceph-osd [kernel.kallsyms] [k] update_cfs_load >>>>>>> 0.63% ceph-osd [kernel.kallsyms] [k] link_path_walk >>>>>>> >>>>>>> If you'd like some more tests run just let me know, more than >>>>>>>happy to help >>>>>>> >>>>>>> Thanks >>>>>>> -Matt >>>>> >>>>> >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> ceph-users@xxxxxxxxxxxxxx >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>> >> > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com