On 08/27/2013 09:55 AM, Oliver Daudey wrote:
Hey Samuel,
These:
osd op threads = 8
osd disk threads = 2
filestore op threads = 8
They increased performance on my test-cluster, but seemed to have the
opposite effect on the much heavier loaded production-environment.
Interesting. We did some testing back in bobtail of a number of these
options and some times saw different effects depending on the type of IO
and filesystem:
http://ceph.com/community/ceph-bobtail-jbod-performance-tuning/#summary
Notice specifically what happens with OSD op threads as you increase the
number of threads at different read sizes.
That testing was all done on a single node. It's entirely possible that
other multi-node workloads could have different effects too.
Regards,
Oliver
On 27-08-13 16:37, Samuel Just wrote:
What options were you using?
-Sam
On Tue, Aug 27, 2013 at 7:35 AM, Oliver Daudey <oliver@xxxxxxxxx> wrote:
Hey Ian, Samuel,
FYI: I still had some attempted optimization-options in place on the
production-cluster, which might have skewed my results a bit. OSD
version 0.67.2-16-geeb1f86 seems to be a lot less hard on the CPU in the
configuration that I did all other tests in. I haven't yet verified
sufficiently if this is accompanied by a speed-increase as well. On the
test-cluster, I didn't see any difference in speed, but that may not
mean much, as the load-pattern on production is totally different.
Sorry for that mixup.
Updated `perf top'-output, extra options removed, under current load,
which should be higher than in my previous mail:
18.08% [kernel] [k] intel_idle
5.87% [kernel] [k] find_busiest_group
4.92% kvm [.] 0xcefe2
3.24% [kernel] [k] native_write_msr_safe
2.92% [kernel] [k]
default_send_IPI_mask_sequence_phys
2.66% [kernel] [k] _raw_spin_lock
1.50% [kernel] [k] native_write_cr0
1.36% libleveldb.so.1.9 [.] 0x3cebc
1.27% [kernel] [k] __hrtimer_start_range_ns
1.17% [kernel] [k] hrtimer_interrupt
1.10% libc-2.11.3.so [.] memcmp
1.07% [kernel] [k] apic_timer_interrupt
1.00% [kernel] [k] find_next_bit
0.99% [kernel] [k] cpumask_next_and
0.99% [kernel] [k] __schedule
0.97% [kernel] [k] clockevents_program_event
0.97% [kernel] [k] _raw_spin_unlock_irqrestore
0.90% [kernel] [k] fget_light
0.85% [kernel] [k] do_select
0.84% [kernel] [k] reschedule_interrupt
0.83% [kvm_intel] [k] vmx_vcpu_run
0.79% [kernel] [k] _raw_spin_lock_irqsave
0.78% [kernel] [k] try_to_wake_up
0.70% libc-2.11.3.so [.] memcpy
0.66% [kernel] [k] copy_user_generic_string
0.63% [kernel] [k] sync_inodes_sb
0.61% [kernel] [k] load_balance
0.61% [kernel] [k] tg_load_down
0.56% [kernel] [k] irq_entries_start
0.56% libc-2.11.3.so [.] 0x73612
0.54% libpthread-2.11.3.so [.] pthread_mutex_lock
0.51% [kernel] [k] rcu_needs_cpu
Regards,
Oliver
On 27-08-13 16:04, Ian Colle wrote:
On Aug 27, 2013, at 2:08, Oliver Daudey <oliver@xxxxxxxxx> wrote:
Hey Samuel,
The "PGLog::check()" is now no longer visible in profiling, so it helped
for that. Unfortunately, it doesn't seem to have helped to bring down
the OSD's CPU-loading much. Leveldb still uses much more than in
Cuttlefish. On my test-cluster, I didn't notice any difference in the
RBD bench-results, either, so I have to assume that it didn't help
performance much.
Here's the `perf top' I took just now on my production-cluster with your
new version, under regular load. Also note the "memcmp" and "memcpy",
which also don't show up when running a Cuttlefish-OSD:
memcpy is in fact also present in your Cuttlefish OSD, just a bit further down the list (increased from .7% to 1.4%).
memcmp definitely looks suspicious and is something we're looking into.
15.65% [kernel] [k]
intel_idle
7.20% libleveldb.so.1.9 [.]
0x3ceae
6.28% libc-2.11.3.so [.]
memcmp
5.22% [kernel] [k]
find_busiest_group
3.92% kvm [.]
0x2cf006
2.40% libleveldb.so.1.9 [.]
leveldb::InternalKeyComparator::Compar
1.95% [kernel] [k]
_raw_spin_lock
1.69% [kernel] [k]
default_send_IPI_mask_sequence_phys
1.46% libc-2.11.3.so [.]
memcpy
1.17% libleveldb.so.1.9 [.]
leveldb::Block::Iter::Next()
1.16% [kernel] [k]
hrtimer_interrupt
1.07% [kernel] [k]
native_write_cr0
1.01% [kernel] [k]
__hrtimer_start_range_ns
1.00% [kernel] [k]
clockevents_program_event
0.93% [kernel] [k]
find_next_bit
0.93% libstdc++.so.6.0.13 [.]
std::string::_M_mutate(unsigned long,
0.89% [kernel] [k]
cpumask_next_and
0.87% [kernel] [k]
__schedule
0.85% [kernel] [k]
_raw_spin_unlock_irqrestore
0.85% [kernel] [k]
do_select
0.84% [kernel] [k]
apic_timer_interrupt
0.80% [kernel] [k]
fget_light
0.79% [kernel] [k]
native_write_msr_safe
0.76% [kernel] [k]
_raw_spin_lock_irqsave
0.66% libc-2.11.3.so [.]
0xdc6d8
0.61% libpthread-2.11.3.so [.]
pthread_mutex_lock
0.61% [kernel] [k]
tg_load_down
0.59% [kernel] [k]
reschedule_interrupt
0.59% libsnappy.so.1.1.2 [.]
snappy::RawUncompress(snappy::Source*,
0.56% libstdc++.so.6.0.13 [.] std::string::append(char
const*, unsig
0.54% [kvm_intel] [k]
vmx_vcpu_run
0.53% [kernel] [k]
copy_user_generic_string
0.53% [kernel] [k]
load_balance
0.50% [kernel] [k]
rcu_needs_cpu
0.45% [kernel] [k] fput
Regards,
Oliver
On ma, 2013-08-26 at 23:33 -0700, Samuel Just wrote:
I just pushed a patch to wip-dumpling-log-assert (based on current
dumpling head). I had disabled most of the code in PGLog::check() but
left an (I thought) innocuous assert. It seems that with (at least)
g++ 4.6.3, stl list::size() is linear in the size of the list, so that
assert actually traverses the pg log on each operation. The patch in
wip-dumpling-log-assert should disable that assert as well by default.
Let me know if it helps.
It should be built within an hour of this email.
-Sam
On Mon, Aug 26, 2013 at 10:46 PM, Matthew Anderson
<manderson8787@xxxxxxxxx> wrote:
Hi Guys,
I'm having the same problem as Oliver with 0.67.2. CPU usage is around
double that of the 0.61.8 OSD's in the same cluster which appears to
be causing the performance decrease.
I did a perf comparison (not sure if I did it right but it seems ok).
Both hosts are the same spec running Ubuntu 12.04.1 (3.2 kernel),
journal and osd data is on an SSD, OSD's are in the same pool with the
same weight and the perf tests were run at the same time on a
realworld load consisting of RBD traffic only.
Dumpling -
Events: 332K cycles
17.93% ceph-osd libc-2.15.so [.] 0x15d523
17.03% ceph-osd ceph-osd [.] 0x5c2897
4.66% ceph-osd ceph-osd [.]
leveldb::InternalKeyComparator::Compare(leveldb::Slice const&, level
3.46% ceph-osd ceph-osd [.] leveldb::Block::Iter::Next()
2.70% ceph-osd libstdc++.so.6.0.16 [.]
std::string::_M_mutate(unsigned long, unsigned long, unsigned long)
2.60% ceph-osd ceph-osd [.] PGLog::check()
2.57% ceph-osd [kernel.kallsyms] [k] __ticket_spin_lock
2.49% ceph-osd ceph-osd [.] ceph_crc32c_le_intel
1.93% ceph-osd libsnappy.so.1.1.2 [.]
snappy::RawUncompress(snappy::Source*, char*)
1.53% ceph-osd libstdc++.so.6.0.16 [.] std::string::append(char
const*, unsigned long)
1.47% ceph-osd libtcmalloc.so.0.1.0 [.] operator new(unsigned long)
1.33% ceph-osd [kernel.kallsyms] [k] copy_user_generic_string
0.98% ceph-osd libtcmalloc.so.0.1.0 [.] operator delete(void*)
0.90% ceph-osd libstdc++.so.6.0.16 [.] std::string::assign(char
const*, unsigned long)
0.75% ceph-osd libstdc++.so.6.0.16 [.]
std::string::_M_replace_safe(unsigned long, unsigned long, char cons
0.58% ceph-osd [kernel.kallsyms] [k] wait_sb_inodes
0.55% ceph-osd ceph-osd [.]
leveldb::Block::Iter::Valid() const
0.51% ceph-osd libtcmalloc.so.0.1.0 [.]
tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::
0.50% ceph-osd libtcmalloc.so.0.1.0 [.]
tcmalloc::CentralFreeList::FetchFromSpans()
0.47% ceph-osd libstdc++.so.6.0.16 [.] 0x9ebc8
0.46% ceph-osd libc-2.15.so [.] vfprintf
0.45% ceph-osd [kernel.kallsyms] [k] find_busiest_group
0.45% ceph-osd libstdc++.so.6.0.16 [.]
std::string::resize(unsigned long, char)
0.43% ceph-osd libpthread-2.15.so [.] pthread_mutex_unlock
0.41% ceph-osd [kernel.kallsyms] [k] iput_final
0.40% ceph-osd ceph-osd [.]
leveldb::Block::Iter::Seek(leveldb::Slice const&)
0.39% ceph-osd libc-2.15.so [.] _IO_vfscanf
0.39% ceph-osd ceph-osd [.] leveldb::Block::Iter::key() const
0.39% ceph-osd libtcmalloc.so.0.1.0 [.]
tcmalloc::CentralFreeList::ReleaseToSpans(void*)
0.37% ceph-osd libstdc++.so.6.0.16 [.] std::basic_ostream<char,
std::char_traits<char> >& std::__ostream_in
Cuttlefish -
Events: 160K cycles
7.53% ceph-osd [kernel.kallsyms] [k] __ticket_spin_lock
6.26% ceph-osd libc-2.15.so [.] 0x89115
3.06% ceph-osd ceph-osd [.] ceph_crc32c_le
2.66% ceph-osd libtcmalloc.so.0.1.0 [.] operator new(unsigned long)
2.46% ceph-osd [kernel.kallsyms] [k] find_busiest_group
1.80% ceph-osd libtcmalloc.so.0.1.0 [.] operator delete(void*)
1.42% ceph-osd [kernel.kallsyms] [k] try_to_wake_up
1.27% ceph-osd ceph-osd [.] 0x531fb6
1.21% ceph-osd libstdc++.so.6.0.16 [.] 0x9ebc8
1.14% ceph-osd [kernel.kallsyms] [k] wait_sb_inodes
1.02% ceph-osd libc-2.15.so [.] _IO_vfscanf
1.01% ceph-osd [kernel.kallsyms] [k] update_shares
0.98% ceph-osd [kernel.kallsyms] [k] filemap_fdatawait_range
0.90% ceph-osd libstdc++.so.6.0.16 [.] std::basic_ostream<char,
std::char_traits<char> >& std
0.89% ceph-osd [kernel.kallsyms] [k] iput_final
0.79% ceph-osd libstdc++.so.6.0.16 [.] std::basic_string<char,
std::char_traits<char>, std::a
0.79% ceph-osd [kernel.kallsyms] [k] copy_user_generic_string
0.78% ceph-osd libc-2.15.so [.] vfprintf
0.70% ceph-osd libtcmalloc.so.0.1.0 [.]
tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc:
0.69% ceph-osd [kernel.kallsyms] [k] __d_lookup_rcu
0.69% ceph-osd libtcmalloc.so.0.1.0 [.]
tcmalloc::CentralFreeList::FetchFromSpans()
0.66% ceph-osd [kernel.kallsyms] [k] igrab
0.63% ceph-osd [kernel.kallsyms] [k] update_cfs_load
0.63% ceph-osd [kernel.kallsyms] [k] link_path_walk
If you'd like some more tests run just let me know, more than happy to help
Thanks
-Matt
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com