What options were you using? -Sam On Tue, Aug 27, 2013 at 7:35 AM, Oliver Daudey <oliver@xxxxxxxxx> wrote: > Hey Ian, Samuel, > > FYI: I still had some attempted optimization-options in place on the > production-cluster, which might have skewed my results a bit. OSD > version 0.67.2-16-geeb1f86 seems to be a lot less hard on the CPU in the > configuration that I did all other tests in. I haven't yet verified > sufficiently if this is accompanied by a speed-increase as well. On the > test-cluster, I didn't see any difference in speed, but that may not > mean much, as the load-pattern on production is totally different. > Sorry for that mixup. > > Updated `perf top'-output, extra options removed, under current load, > which should be higher than in my previous mail: > 18.08% [kernel] [k] intel_idle > > 5.87% [kernel] [k] find_busiest_group > > 4.92% kvm [.] 0xcefe2 > > 3.24% [kernel] [k] native_write_msr_safe > > 2.92% [kernel] [k] > default_send_IPI_mask_sequence_phys > 2.66% [kernel] [k] _raw_spin_lock > > 1.50% [kernel] [k] native_write_cr0 > > 1.36% libleveldb.so.1.9 [.] 0x3cebc > > 1.27% [kernel] [k] __hrtimer_start_range_ns > > 1.17% [kernel] [k] hrtimer_interrupt > > 1.10% libc-2.11.3.so [.] memcmp > > 1.07% [kernel] [k] apic_timer_interrupt > > 1.00% [kernel] [k] find_next_bit > > 0.99% [kernel] [k] cpumask_next_and > > 0.99% [kernel] [k] __schedule > > 0.97% [kernel] [k] clockevents_program_event > > 0.97% [kernel] [k] _raw_spin_unlock_irqrestore > > 0.90% [kernel] [k] fget_light > > 0.85% [kernel] [k] do_select > > 0.84% [kernel] [k] reschedule_interrupt > > 0.83% [kvm_intel] [k] vmx_vcpu_run > > 0.79% [kernel] [k] _raw_spin_lock_irqsave > > 0.78% [kernel] [k] try_to_wake_up > > 0.70% libc-2.11.3.so [.] memcpy > > 0.66% [kernel] [k] copy_user_generic_string > > 0.63% [kernel] [k] sync_inodes_sb > > 0.61% [kernel] [k] load_balance > > 0.61% [kernel] [k] tg_load_down > > 0.56% [kernel] [k] irq_entries_start > > 0.56% libc-2.11.3.so [.] 0x73612 > > 0.54% libpthread-2.11.3.so [.] pthread_mutex_lock > > 0.51% [kernel] [k] rcu_needs_cpu > > > Regards, > > Oliver > > On 27-08-13 16:04, Ian Colle wrote: >> >> On Aug 27, 2013, at 2:08, Oliver Daudey <oliver@xxxxxxxxx> wrote: >> >>> Hey Samuel, >>> >>> The "PGLog::check()" is now no longer visible in profiling, so it helped >>> for that. Unfortunately, it doesn't seem to have helped to bring down >>> the OSD's CPU-loading much. Leveldb still uses much more than in >>> Cuttlefish. On my test-cluster, I didn't notice any difference in the >>> RBD bench-results, either, so I have to assume that it didn't help >>> performance much. >>> >>> Here's the `perf top' I took just now on my production-cluster with your >>> new version, under regular load. Also note the "memcmp" and "memcpy", >>> which also don't show up when running a Cuttlefish-OSD: >> >> memcpy is in fact also present in your Cuttlefish OSD, just a bit further down the list (increased from .7% to 1.4%). >> >> memcmp definitely looks suspicious and is something we're looking into. >> >>> 15.65% [kernel] [k] >>> intel_idle >>> 7.20% libleveldb.so.1.9 [.] >>> 0x3ceae >>> 6.28% libc-2.11.3.so [.] >>> memcmp >>> 5.22% [kernel] [k] >>> find_busiest_group >>> 3.92% kvm [.] >>> 0x2cf006 >>> 2.40% libleveldb.so.1.9 [.] >>> leveldb::InternalKeyComparator::Compar >>> 1.95% [kernel] [k] >>> _raw_spin_lock >>> 1.69% [kernel] [k] >>> default_send_IPI_mask_sequence_phys >>> 1.46% libc-2.11.3.so [.] >>> memcpy >>> 1.17% libleveldb.so.1.9 [.] >>> leveldb::Block::Iter::Next() >>> 1.16% [kernel] [k] >>> hrtimer_interrupt >>> 1.07% [kernel] [k] >>> native_write_cr0 >>> 1.01% [kernel] [k] >>> __hrtimer_start_range_ns >>> 1.00% [kernel] [k] >>> clockevents_program_event >>> 0.93% [kernel] [k] >>> find_next_bit >>> 0.93% libstdc++.so.6.0.13 [.] >>> std::string::_M_mutate(unsigned long, >>> 0.89% [kernel] [k] >>> cpumask_next_and >>> 0.87% [kernel] [k] >>> __schedule >>> 0.85% [kernel] [k] >>> _raw_spin_unlock_irqrestore >>> 0.85% [kernel] [k] >>> do_select >>> 0.84% [kernel] [k] >>> apic_timer_interrupt >>> 0.80% [kernel] [k] >>> fget_light >>> 0.79% [kernel] [k] >>> native_write_msr_safe >>> 0.76% [kernel] [k] >>> _raw_spin_lock_irqsave >>> 0.66% libc-2.11.3.so [.] >>> 0xdc6d8 >>> 0.61% libpthread-2.11.3.so [.] >>> pthread_mutex_lock >>> 0.61% [kernel] [k] >>> tg_load_down >>> 0.59% [kernel] [k] >>> reschedule_interrupt >>> 0.59% libsnappy.so.1.1.2 [.] >>> snappy::RawUncompress(snappy::Source*, >>> 0.56% libstdc++.so.6.0.13 [.] std::string::append(char >>> const*, unsig >>> 0.54% [kvm_intel] [k] >>> vmx_vcpu_run >>> 0.53% [kernel] [k] >>> copy_user_generic_string >>> 0.53% [kernel] [k] >>> load_balance >>> 0.50% [kernel] [k] >>> rcu_needs_cpu >>> 0.45% [kernel] [k] fput >>> >>> >>> Regards, >>> >>> Oliver >>> >>> On ma, 2013-08-26 at 23:33 -0700, Samuel Just wrote: >>>> I just pushed a patch to wip-dumpling-log-assert (based on current >>>> dumpling head). I had disabled most of the code in PGLog::check() but >>>> left an (I thought) innocuous assert. It seems that with (at least) >>>> g++ 4.6.3, stl list::size() is linear in the size of the list, so that >>>> assert actually traverses the pg log on each operation. The patch in >>>> wip-dumpling-log-assert should disable that assert as well by default. >>>> Let me know if it helps. >>>> >>>> It should be built within an hour of this email. >>>> -Sam >>>> >>>> On Mon, Aug 26, 2013 at 10:46 PM, Matthew Anderson >>>> <manderson8787@xxxxxxxxx> wrote: >>>>> Hi Guys, >>>>> >>>>> I'm having the same problem as Oliver with 0.67.2. CPU usage is around >>>>> double that of the 0.61.8 OSD's in the same cluster which appears to >>>>> be causing the performance decrease. >>>>> >>>>> I did a perf comparison (not sure if I did it right but it seems ok). >>>>> Both hosts are the same spec running Ubuntu 12.04.1 (3.2 kernel), >>>>> journal and osd data is on an SSD, OSD's are in the same pool with the >>>>> same weight and the perf tests were run at the same time on a >>>>> realworld load consisting of RBD traffic only. >>>>> >>>>> Dumpling - >>>>> >>>>> Events: 332K cycles >>>>> 17.93% ceph-osd libc-2.15.so [.] 0x15d523 >>>>> 17.03% ceph-osd ceph-osd [.] 0x5c2897 >>>>> 4.66% ceph-osd ceph-osd [.] >>>>> leveldb::InternalKeyComparator::Compare(leveldb::Slice const&, level >>>>> 3.46% ceph-osd ceph-osd [.] leveldb::Block::Iter::Next() >>>>> 2.70% ceph-osd libstdc++.so.6.0.16 [.] >>>>> std::string::_M_mutate(unsigned long, unsigned long, unsigned long) >>>>> 2.60% ceph-osd ceph-osd [.] PGLog::check() >>>>> 2.57% ceph-osd [kernel.kallsyms] [k] __ticket_spin_lock >>>>> 2.49% ceph-osd ceph-osd [.] ceph_crc32c_le_intel >>>>> 1.93% ceph-osd libsnappy.so.1.1.2 [.] >>>>> snappy::RawUncompress(snappy::Source*, char*) >>>>> 1.53% ceph-osd libstdc++.so.6.0.16 [.] std::string::append(char >>>>> const*, unsigned long) >>>>> 1.47% ceph-osd libtcmalloc.so.0.1.0 [.] operator new(unsigned long) >>>>> 1.33% ceph-osd [kernel.kallsyms] [k] copy_user_generic_string >>>>> 0.98% ceph-osd libtcmalloc.so.0.1.0 [.] operator delete(void*) >>>>> 0.90% ceph-osd libstdc++.so.6.0.16 [.] std::string::assign(char >>>>> const*, unsigned long) >>>>> 0.75% ceph-osd libstdc++.so.6.0.16 [.] >>>>> std::string::_M_replace_safe(unsigned long, unsigned long, char cons >>>>> 0.58% ceph-osd [kernel.kallsyms] [k] wait_sb_inodes >>>>> 0.55% ceph-osd ceph-osd [.] >>>>> leveldb::Block::Iter::Valid() const >>>>> 0.51% ceph-osd libtcmalloc.so.0.1.0 [.] >>>>> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache:: >>>>> 0.50% ceph-osd libtcmalloc.so.0.1.0 [.] >>>>> tcmalloc::CentralFreeList::FetchFromSpans() >>>>> 0.47% ceph-osd libstdc++.so.6.0.16 [.] 0x9ebc8 >>>>> 0.46% ceph-osd libc-2.15.so [.] vfprintf >>>>> 0.45% ceph-osd [kernel.kallsyms] [k] find_busiest_group >>>>> 0.45% ceph-osd libstdc++.so.6.0.16 [.] >>>>> std::string::resize(unsigned long, char) >>>>> 0.43% ceph-osd libpthread-2.15.so [.] pthread_mutex_unlock >>>>> 0.41% ceph-osd [kernel.kallsyms] [k] iput_final >>>>> 0.40% ceph-osd ceph-osd [.] >>>>> leveldb::Block::Iter::Seek(leveldb::Slice const&) >>>>> 0.39% ceph-osd libc-2.15.so [.] _IO_vfscanf >>>>> 0.39% ceph-osd ceph-osd [.] leveldb::Block::Iter::key() const >>>>> 0.39% ceph-osd libtcmalloc.so.0.1.0 [.] >>>>> tcmalloc::CentralFreeList::ReleaseToSpans(void*) >>>>> 0.37% ceph-osd libstdc++.so.6.0.16 [.] std::basic_ostream<char, >>>>> std::char_traits<char> >& std::__ostream_in >>>>> >>>>> >>>>> Cuttlefish - >>>>> >>>>> Events: 160K cycles >>>>> 7.53% ceph-osd [kernel.kallsyms] [k] __ticket_spin_lock >>>>> 6.26% ceph-osd libc-2.15.so [.] 0x89115 >>>>> 3.06% ceph-osd ceph-osd [.] ceph_crc32c_le >>>>> 2.66% ceph-osd libtcmalloc.so.0.1.0 [.] operator new(unsigned long) >>>>> 2.46% ceph-osd [kernel.kallsyms] [k] find_busiest_group >>>>> 1.80% ceph-osd libtcmalloc.so.0.1.0 [.] operator delete(void*) >>>>> 1.42% ceph-osd [kernel.kallsyms] [k] try_to_wake_up >>>>> 1.27% ceph-osd ceph-osd [.] 0x531fb6 >>>>> 1.21% ceph-osd libstdc++.so.6.0.16 [.] 0x9ebc8 >>>>> 1.14% ceph-osd [kernel.kallsyms] [k] wait_sb_inodes >>>>> 1.02% ceph-osd libc-2.15.so [.] _IO_vfscanf >>>>> 1.01% ceph-osd [kernel.kallsyms] [k] update_shares >>>>> 0.98% ceph-osd [kernel.kallsyms] [k] filemap_fdatawait_range >>>>> 0.90% ceph-osd libstdc++.so.6.0.16 [.] std::basic_ostream<char, >>>>> std::char_traits<char> >& std >>>>> 0.89% ceph-osd [kernel.kallsyms] [k] iput_final >>>>> 0.79% ceph-osd libstdc++.so.6.0.16 [.] std::basic_string<char, >>>>> std::char_traits<char>, std::a >>>>> 0.79% ceph-osd [kernel.kallsyms] [k] copy_user_generic_string >>>>> 0.78% ceph-osd libc-2.15.so [.] vfprintf >>>>> 0.70% ceph-osd libtcmalloc.so.0.1.0 [.] >>>>> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc: >>>>> 0.69% ceph-osd [kernel.kallsyms] [k] __d_lookup_rcu >>>>> 0.69% ceph-osd libtcmalloc.so.0.1.0 [.] >>>>> tcmalloc::CentralFreeList::FetchFromSpans() >>>>> 0.66% ceph-osd [kernel.kallsyms] [k] igrab >>>>> 0.63% ceph-osd [kernel.kallsyms] [k] update_cfs_load >>>>> 0.63% ceph-osd [kernel.kallsyms] [k] link_path_walk >>>>> >>>>> If you'd like some more tests run just let me know, more than happy to help >>>>> >>>>> Thanks >>>>> -Matt >>> >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com