Hey Samuel, These: osd op threads = 8 osd disk threads = 2 filestore op threads = 8 They increased performance on my test-cluster, but seemed to have the opposite effect on the much heavier loaded production-environment. Regards, Oliver On 27-08-13 16:37, Samuel Just wrote: > What options were you using? > -Sam > > On Tue, Aug 27, 2013 at 7:35 AM, Oliver Daudey <oliver@xxxxxxxxx> wrote: >> Hey Ian, Samuel, >> >> FYI: I still had some attempted optimization-options in place on the >> production-cluster, which might have skewed my results a bit. OSD >> version 0.67.2-16-geeb1f86 seems to be a lot less hard on the CPU in the >> configuration that I did all other tests in. I haven't yet verified >> sufficiently if this is accompanied by a speed-increase as well. On the >> test-cluster, I didn't see any difference in speed, but that may not >> mean much, as the load-pattern on production is totally different. >> Sorry for that mixup. >> >> Updated `perf top'-output, extra options removed, under current load, >> which should be higher than in my previous mail: >> 18.08% [kernel] [k] intel_idle >> >> 5.87% [kernel] [k] find_busiest_group >> >> 4.92% kvm [.] 0xcefe2 >> >> 3.24% [kernel] [k] native_write_msr_safe >> >> 2.92% [kernel] [k] >> default_send_IPI_mask_sequence_phys >> 2.66% [kernel] [k] _raw_spin_lock >> >> 1.50% [kernel] [k] native_write_cr0 >> >> 1.36% libleveldb.so.1.9 [.] 0x3cebc >> >> 1.27% [kernel] [k] __hrtimer_start_range_ns >> >> 1.17% [kernel] [k] hrtimer_interrupt >> >> 1.10% libc-2.11.3.so [.] memcmp >> >> 1.07% [kernel] [k] apic_timer_interrupt >> >> 1.00% [kernel] [k] find_next_bit >> >> 0.99% [kernel] [k] cpumask_next_and >> >> 0.99% [kernel] [k] __schedule >> >> 0.97% [kernel] [k] clockevents_program_event >> >> 0.97% [kernel] [k] _raw_spin_unlock_irqrestore >> >> 0.90% [kernel] [k] fget_light >> >> 0.85% [kernel] [k] do_select >> >> 0.84% [kernel] [k] reschedule_interrupt >> >> 0.83% [kvm_intel] [k] vmx_vcpu_run >> >> 0.79% [kernel] [k] _raw_spin_lock_irqsave >> >> 0.78% [kernel] [k] try_to_wake_up >> >> 0.70% libc-2.11.3.so [.] memcpy >> >> 0.66% [kernel] [k] copy_user_generic_string >> >> 0.63% [kernel] [k] sync_inodes_sb >> >> 0.61% [kernel] [k] load_balance >> >> 0.61% [kernel] [k] tg_load_down >> >> 0.56% [kernel] [k] irq_entries_start >> >> 0.56% libc-2.11.3.so [.] 0x73612 >> >> 0.54% libpthread-2.11.3.so [.] pthread_mutex_lock >> >> 0.51% [kernel] [k] rcu_needs_cpu >> >> >> Regards, >> >> Oliver >> >> On 27-08-13 16:04, Ian Colle wrote: >>> >>> On Aug 27, 2013, at 2:08, Oliver Daudey <oliver@xxxxxxxxx> wrote: >>> >>>> Hey Samuel, >>>> >>>> The "PGLog::check()" is now no longer visible in profiling, so it helped >>>> for that. Unfortunately, it doesn't seem to have helped to bring down >>>> the OSD's CPU-loading much. Leveldb still uses much more than in >>>> Cuttlefish. On my test-cluster, I didn't notice any difference in the >>>> RBD bench-results, either, so I have to assume that it didn't help >>>> performance much. >>>> >>>> Here's the `perf top' I took just now on my production-cluster with your >>>> new version, under regular load. Also note the "memcmp" and "memcpy", >>>> which also don't show up when running a Cuttlefish-OSD: >>> >>> memcpy is in fact also present in your Cuttlefish OSD, just a bit further down the list (increased from .7% to 1.4%). >>> >>> memcmp definitely looks suspicious and is something we're looking into. >>> >>>> 15.65% [kernel] [k] >>>> intel_idle >>>> 7.20% libleveldb.so.1.9 [.] >>>> 0x3ceae >>>> 6.28% libc-2.11.3.so [.] >>>> memcmp >>>> 5.22% [kernel] [k] >>>> find_busiest_group >>>> 3.92% kvm [.] >>>> 0x2cf006 >>>> 2.40% libleveldb.so.1.9 [.] >>>> leveldb::InternalKeyComparator::Compar >>>> 1.95% [kernel] [k] >>>> _raw_spin_lock >>>> 1.69% [kernel] [k] >>>> default_send_IPI_mask_sequence_phys >>>> 1.46% libc-2.11.3.so [.] >>>> memcpy >>>> 1.17% libleveldb.so.1.9 [.] >>>> leveldb::Block::Iter::Next() >>>> 1.16% [kernel] [k] >>>> hrtimer_interrupt >>>> 1.07% [kernel] [k] >>>> native_write_cr0 >>>> 1.01% [kernel] [k] >>>> __hrtimer_start_range_ns >>>> 1.00% [kernel] [k] >>>> clockevents_program_event >>>> 0.93% [kernel] [k] >>>> find_next_bit >>>> 0.93% libstdc++.so.6.0.13 [.] >>>> std::string::_M_mutate(unsigned long, >>>> 0.89% [kernel] [k] >>>> cpumask_next_and >>>> 0.87% [kernel] [k] >>>> __schedule >>>> 0.85% [kernel] [k] >>>> _raw_spin_unlock_irqrestore >>>> 0.85% [kernel] [k] >>>> do_select >>>> 0.84% [kernel] [k] >>>> apic_timer_interrupt >>>> 0.80% [kernel] [k] >>>> fget_light >>>> 0.79% [kernel] [k] >>>> native_write_msr_safe >>>> 0.76% [kernel] [k] >>>> _raw_spin_lock_irqsave >>>> 0.66% libc-2.11.3.so [.] >>>> 0xdc6d8 >>>> 0.61% libpthread-2.11.3.so [.] >>>> pthread_mutex_lock >>>> 0.61% [kernel] [k] >>>> tg_load_down >>>> 0.59% [kernel] [k] >>>> reschedule_interrupt >>>> 0.59% libsnappy.so.1.1.2 [.] >>>> snappy::RawUncompress(snappy::Source*, >>>> 0.56% libstdc++.so.6.0.13 [.] std::string::append(char >>>> const*, unsig >>>> 0.54% [kvm_intel] [k] >>>> vmx_vcpu_run >>>> 0.53% [kernel] [k] >>>> copy_user_generic_string >>>> 0.53% [kernel] [k] >>>> load_balance >>>> 0.50% [kernel] [k] >>>> rcu_needs_cpu >>>> 0.45% [kernel] [k] fput >>>> >>>> >>>> Regards, >>>> >>>> Oliver >>>> >>>> On ma, 2013-08-26 at 23:33 -0700, Samuel Just wrote: >>>>> I just pushed a patch to wip-dumpling-log-assert (based on current >>>>> dumpling head). I had disabled most of the code in PGLog::check() but >>>>> left an (I thought) innocuous assert. It seems that with (at least) >>>>> g++ 4.6.3, stl list::size() is linear in the size of the list, so that >>>>> assert actually traverses the pg log on each operation. The patch in >>>>> wip-dumpling-log-assert should disable that assert as well by default. >>>>> Let me know if it helps. >>>>> >>>>> It should be built within an hour of this email. >>>>> -Sam >>>>> >>>>> On Mon, Aug 26, 2013 at 10:46 PM, Matthew Anderson >>>>> <manderson8787@xxxxxxxxx> wrote: >>>>>> Hi Guys, >>>>>> >>>>>> I'm having the same problem as Oliver with 0.67.2. CPU usage is around >>>>>> double that of the 0.61.8 OSD's in the same cluster which appears to >>>>>> be causing the performance decrease. >>>>>> >>>>>> I did a perf comparison (not sure if I did it right but it seems ok). >>>>>> Both hosts are the same spec running Ubuntu 12.04.1 (3.2 kernel), >>>>>> journal and osd data is on an SSD, OSD's are in the same pool with the >>>>>> same weight and the perf tests were run at the same time on a >>>>>> realworld load consisting of RBD traffic only. >>>>>> >>>>>> Dumpling - >>>>>> >>>>>> Events: 332K cycles >>>>>> 17.93% ceph-osd libc-2.15.so [.] 0x15d523 >>>>>> 17.03% ceph-osd ceph-osd [.] 0x5c2897 >>>>>> 4.66% ceph-osd ceph-osd [.] >>>>>> leveldb::InternalKeyComparator::Compare(leveldb::Slice const&, level >>>>>> 3.46% ceph-osd ceph-osd [.] leveldb::Block::Iter::Next() >>>>>> 2.70% ceph-osd libstdc++.so.6.0.16 [.] >>>>>> std::string::_M_mutate(unsigned long, unsigned long, unsigned long) >>>>>> 2.60% ceph-osd ceph-osd [.] PGLog::check() >>>>>> 2.57% ceph-osd [kernel.kallsyms] [k] __ticket_spin_lock >>>>>> 2.49% ceph-osd ceph-osd [.] ceph_crc32c_le_intel >>>>>> 1.93% ceph-osd libsnappy.so.1.1.2 [.] >>>>>> snappy::RawUncompress(snappy::Source*, char*) >>>>>> 1.53% ceph-osd libstdc++.so.6.0.16 [.] std::string::append(char >>>>>> const*, unsigned long) >>>>>> 1.47% ceph-osd libtcmalloc.so.0.1.0 [.] operator new(unsigned long) >>>>>> 1.33% ceph-osd [kernel.kallsyms] [k] copy_user_generic_string >>>>>> 0.98% ceph-osd libtcmalloc.so.0.1.0 [.] operator delete(void*) >>>>>> 0.90% ceph-osd libstdc++.so.6.0.16 [.] std::string::assign(char >>>>>> const*, unsigned long) >>>>>> 0.75% ceph-osd libstdc++.so.6.0.16 [.] >>>>>> std::string::_M_replace_safe(unsigned long, unsigned long, char cons >>>>>> 0.58% ceph-osd [kernel.kallsyms] [k] wait_sb_inodes >>>>>> 0.55% ceph-osd ceph-osd [.] >>>>>> leveldb::Block::Iter::Valid() const >>>>>> 0.51% ceph-osd libtcmalloc.so.0.1.0 [.] >>>>>> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache:: >>>>>> 0.50% ceph-osd libtcmalloc.so.0.1.0 [.] >>>>>> tcmalloc::CentralFreeList::FetchFromSpans() >>>>>> 0.47% ceph-osd libstdc++.so.6.0.16 [.] 0x9ebc8 >>>>>> 0.46% ceph-osd libc-2.15.so [.] vfprintf >>>>>> 0.45% ceph-osd [kernel.kallsyms] [k] find_busiest_group >>>>>> 0.45% ceph-osd libstdc++.so.6.0.16 [.] >>>>>> std::string::resize(unsigned long, char) >>>>>> 0.43% ceph-osd libpthread-2.15.so [.] pthread_mutex_unlock >>>>>> 0.41% ceph-osd [kernel.kallsyms] [k] iput_final >>>>>> 0.40% ceph-osd ceph-osd [.] >>>>>> leveldb::Block::Iter::Seek(leveldb::Slice const&) >>>>>> 0.39% ceph-osd libc-2.15.so [.] _IO_vfscanf >>>>>> 0.39% ceph-osd ceph-osd [.] leveldb::Block::Iter::key() const >>>>>> 0.39% ceph-osd libtcmalloc.so.0.1.0 [.] >>>>>> tcmalloc::CentralFreeList::ReleaseToSpans(void*) >>>>>> 0.37% ceph-osd libstdc++.so.6.0.16 [.] std::basic_ostream<char, >>>>>> std::char_traits<char> >& std::__ostream_in >>>>>> >>>>>> >>>>>> Cuttlefish - >>>>>> >>>>>> Events: 160K cycles >>>>>> 7.53% ceph-osd [kernel.kallsyms] [k] __ticket_spin_lock >>>>>> 6.26% ceph-osd libc-2.15.so [.] 0x89115 >>>>>> 3.06% ceph-osd ceph-osd [.] ceph_crc32c_le >>>>>> 2.66% ceph-osd libtcmalloc.so.0.1.0 [.] operator new(unsigned long) >>>>>> 2.46% ceph-osd [kernel.kallsyms] [k] find_busiest_group >>>>>> 1.80% ceph-osd libtcmalloc.so.0.1.0 [.] operator delete(void*) >>>>>> 1.42% ceph-osd [kernel.kallsyms] [k] try_to_wake_up >>>>>> 1.27% ceph-osd ceph-osd [.] 0x531fb6 >>>>>> 1.21% ceph-osd libstdc++.so.6.0.16 [.] 0x9ebc8 >>>>>> 1.14% ceph-osd [kernel.kallsyms] [k] wait_sb_inodes >>>>>> 1.02% ceph-osd libc-2.15.so [.] _IO_vfscanf >>>>>> 1.01% ceph-osd [kernel.kallsyms] [k] update_shares >>>>>> 0.98% ceph-osd [kernel.kallsyms] [k] filemap_fdatawait_range >>>>>> 0.90% ceph-osd libstdc++.so.6.0.16 [.] std::basic_ostream<char, >>>>>> std::char_traits<char> >& std >>>>>> 0.89% ceph-osd [kernel.kallsyms] [k] iput_final >>>>>> 0.79% ceph-osd libstdc++.so.6.0.16 [.] std::basic_string<char, >>>>>> std::char_traits<char>, std::a >>>>>> 0.79% ceph-osd [kernel.kallsyms] [k] copy_user_generic_string >>>>>> 0.78% ceph-osd libc-2.15.so [.] vfprintf >>>>>> 0.70% ceph-osd libtcmalloc.so.0.1.0 [.] >>>>>> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc: >>>>>> 0.69% ceph-osd [kernel.kallsyms] [k] __d_lookup_rcu >>>>>> 0.69% ceph-osd libtcmalloc.so.0.1.0 [.] >>>>>> tcmalloc::CentralFreeList::FetchFromSpans() >>>>>> 0.66% ceph-osd [kernel.kallsyms] [k] igrab >>>>>> 0.63% ceph-osd [kernel.kallsyms] [k] update_cfs_load >>>>>> 0.63% ceph-osd [kernel.kallsyms] [k] link_path_walk >>>>>> >>>>>> If you'd like some more tests run just let me know, more than happy to help >>>>>> >>>>>> Thanks >>>>>> -Matt >>>> >>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@xxxxxxxxxxxxxx >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com