Can you attach a log from the startup of one of the dumpling osds on your production machine (no need for logging, just need some of the information dumped on every boot)? libleveldb is leveldb. We've used leveldb for a few things since bobtail. If anything, the load on leveldb should be lighter in dumpling, I would think... I'll have to try to reproduce it locally. I'll keep you posted. -Sam On Sat, Aug 24, 2013 at 10:11 AM, Oliver Daudey <oliver@xxxxxxxxx> wrote: > Hey Samuel, > > Unfortunately, disabling "wbthrottle" made almost no difference on my > production-cluster. OSD-load was still much higher on Dumpling. > > I've mentioned this several times already, but when profiling with `perf > top' on my production-cluster, any time I'm running a Dumpling-OSD, > several "libleveldb"-related entries come up near the top, that don't > show up when running the Cuttlefish-OSD at all. Let's concentrate on > that for a moment, as it's a clearly visible difference on my > production-cluster, which shows the actual problem. > > Dumpling OSDs: > 17.23% [kernel] [k] intel_idle > 6.35% [kernel] [k] find_busiest_group > 4.36% kvm [.] 0x2cdbb0 > 3.38% libleveldb.so.1.9 [.] 0x22821 > 2.40% libc-2.11.3.so [.] memcmp > 2.04% ceph-osd [.] ceph_crc32c_le_intel > 1.90% [kernel] [k] _raw_spin_lock > 1.87% [kernel] [k] copy_user_generic_string > 1.35% [kernel] [k] > default_send_IPI_mask_sequence_phys > 1.34% [kernel] [k] __hrtimer_start_range_ns > 1.14% libc-2.11.3.so [.] memcpy > 1.03% [kernel] [k] hrtimer_interrupt > 1.01% [kernel] [k] do_select > 1.00% [kernel] [k] __schedule > 0.99% [kernel] [k] _raw_spin_unlock_irqrestore > 0.97% [kernel] [k] cpumask_next_and > 0.97% [kernel] [k] find_next_bit > 0.96% libleveldb.so.1.9 [.] > leveldb::InternalKeyComparator::Compar > 0.91% [kernel] [k] _raw_spin_lock_irqsave > 0.91% [kernel] [k] fget_light > 0.89% [kernel] [k] clockevents_program_event > 0.79% [kernel] [k] sync_inodes_sb > 0.78% libleveldb.so.1.9 [.] leveldb::Block::Iter::Next() > 0.75% [kernel] [k] apic_timer_interrupt > 0.70% [kernel] [k] native_write_cr0 > 0.60% [kvm_intel] [k] vmx_vcpu_run > 0.58% [kernel] [k] load_balance > 0.57% [kernel] [k] rcu_needs_cpu > 0.56% ceph-osd [.] PGLog::undirty() > 0.51% libpthread-2.11.3.so [.] pthread_mutex_lock > 0.50% [vdso] [.] 0x7fff6dbff6ce > > Same load, but with Cuttlefish-OSDs: > 19.23% [kernel] [k] intel_idle > 6.43% [kernel] [k] find_busiest_group > 5.25% kvm [.] 0x152a75 > 2.70% ceph-osd [.] ceph_crc32c_le > 2.44% [kernel] [k] _raw_spin_lock > 1.95% [kernel] [k] copy_user_generic_string > 1.53% [kernel] [k] > default_send_IPI_mask_sequence_phys > 1.28% [kernel] [k] __hrtimer_start_range_ns > 1.21% [kernel] [k] do_select > 1.19% [kernel] [k] hrtimer_interrupt > 1.19% [kernel] [k] _raw_spin_unlock_irqrestore > 1.16% [kernel] [k] fget_light > 1.12% [kernel] [k] cpumask_next_and > 1.11% [kernel] [k] clockevents_program_event > 1.08% [kernel] [k] __schedule > 1.08% [kernel] [k] find_next_bit > 0.99% [kernel] [k] _raw_spin_lock_irqsave > 0.90% [kernel] [k] native_write_cr0 > 0.83% [kernel] [k] native_write_msr_safe > 0.82% [kernel] [k] apic_timer_interrupt > 0.70% libc-2.11.3.so [.] memcpy > 0.68% [kernel] [k] sync_inodes_sb > 0.63% [kernel] [k] tg_load_down > 0.63% [kernel] [k] load_balance > 0.61% libpthread-2.11.3.so [.] pthread_mutex_lock > 0.58% [kernel] [k] rcu_needs_cpu > 0.57% [kernel] [k] fput > 0.56% libc-2.11.3.so [.] 0x7fb29 > 0.54% [vdso] [.] 0x7fff2afb873a > 0.50% [kernel] [k] iput > 0.50% [kernel] [k] reschedule_interrupt > > It seems to me like "libleveldb" is accounting for significant extra > CPU-loading on Dumpling. Another interesting fact, is that I only see > it use so much CPU on my production-cluster, which is serving around 80 > KVMs over RBD, over 2x10Gbit/s ethernet per node and so has a much > higher and more diverse load than what I can generate on my > test-cluster. > > Any ideas about this particular one? Did anything change between > Cuttlefish and Dumpling that would account for higher CPU-usage by > "libleveldb", particularly under this client-load? Anything I can try > to bring it down? Thanks! > > > Regards, > > Oliver > > On vr, 2013-08-23 at 13:55 -0700, Samuel Just wrote: >> Ok, can you try setting filestore_op_threads to 1 on both cuttlefish >> and wip-dumpling-perf (with and with wbthrottle, default wbthrottle >> settings). I suspect I created contention in the filestore op threads >> (FileStore::lfn_open specifically), and if so setting it to only use 1 >> thread should even out the performance. >> -Sam >> >> On Fri, Aug 23, 2013 at 1:44 PM, Oliver Daudey <oliver@xxxxxxxxx> wrote: >> > Hey Samuel, >> > >> > I commented the earlier settings out, so it was with defaults. >> > >> > >> > Regards, >> > >> > Oliver >> > >> > On vr, 2013-08-23 at 13:35 -0700, Samuel Just wrote: >> >> When you were running with the wbthrottle on, did you have the >> >> settings I gave you earlier set, or was it using the defaults? >> >> -Sam >> >> >> >> On Fri, Aug 23, 2013 at 12:48 PM, Oliver Daudey <oliver@xxxxxxxxx> wrote: >> >> > Hey Samuel, >> >> > >> >> > That changed something, for the better. :-) >> >> > >> >> > Your test-version, with wbthrottle off: >> >> > # ceph-osd --version >> >> > ceph version 0.67.1-18-g3fe3368 >> >> > (3fe3368ac7178dcd312e89d264d8d81307e582d8) >> >> > # ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok config show | grep >> >> > wbthrottle_enable >> >> > "filestore_wbthrottle_enable": "false", >> >> > # rbd bench-write test --io-pattern=rand >> >> > bench-write io_size 4096 io_threads 16 bytes 1073741824 pattern rand >> >> > SEC OPS OPS/SEC BYTES/SEC >> >> > 1 838 837.36 2283825.44 >> >> > 2 1679 827.66 2309860.21 >> >> > 3 2488 829.30 2320967.64 >> >> > 4 2812 655.55 1847713.58 >> >> > 5 3301 659.64 1849006.69 >> >> > <...> >> >> > 120 54113 450.83 1271297.14 >> >> > 121 54847 453.13 1277957.21 >> >> > 122 55502 454.93 1281769.31 >> >> > 123 55768 451.09 1271512.03 >> >> > 124 55853 450.42 1269452.15 >> >> > 126 55889 442.22 1246494.89 >> >> > 127 56351 443.71 1250308.66 >> >> > 128 57135 446.36 1257746.91 >> >> > 129 57445 444.03 1251566.31 >> >> > 130 57944 445.72 1256426.99 >> >> > >> >> > Now with the wbthrottle-option enabled: >> >> > # ceph-osd --version >> >> > ceph version 0.67.1-18-g3fe3368 >> >> > (3fe3368ac7178dcd312e89d264d8d81307e582d8) >> >> > # ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok config show | grep >> >> > wbthrottle_enable >> >> > "filestore_wbthrottle_enable": "true", >> >> > # rbd bench-write test --io-pattern=randbench-write io_size 4096 >> >> > io_threads 16 bytes 1073741824 pattern rand >> >> > SEC OPS OPS/SEC BYTES/SEC >> >> > 1 822 821.23 2303875.45 >> >> > 2 1510 738.54 2021390.37 >> >> > 3 2224 740.29 1989216.54 >> >> > 4 2843 708.01 1912598.24 >> >> > 5 3294 652.35 1774048.26 >> >> > <...> >> >> > 120 59470 495.58 1388776.41 >> >> > 122 60067 489.29 1370954.96 >> >> > 123 60083 487.74 1366434.49 >> >> > 124 60449 487.49 1366477.77 >> >> > 125 61122 488.97 1370480.73 >> >> > 126 61679 489.52 1372290.03 >> >> > 127 62195 489.69 1372523.64 >> >> > 128 62608 489.02 1370226.98 >> >> > 129 62655 485.68 1360719.66 >> >> > 130 62688 482.01 1350560.76 >> >> > >> >> > Slightly faster than with wbthrottle disabled.. For comparison, one >> >> > more time with Cuttlefish-OSDs: >> >> > # ceph-osd --version >> >> > ceph version 0.61.7 (8f010aff684e820ecc837c25ac77c7a05d7191ff) >> >> > # rbd bench-write test --io-pattern=rand >> >> > bench-write io_size 4096 io_threads 16 bytes 1073741824 pattern rand >> >> > SEC OPS OPS/SEC BYTES/SEC >> >> > 1 1074 1073.80 3079631.51 >> >> > 2 2121 1060.21 3001542.58 >> >> > 3 3111 845.59 2422586.94 >> >> > 4 3132 745.40 2132915.48 >> >> > 5 3336 665.33 1904205.15 >> >> > <...> >> >> > 120 63700 530.81 1530767.77 >> >> > 121 64781 535.37 1544861.37 >> >> > 122 64944 532.10 1535398.54 >> >> > 123 65312 527.70 1522971.49 >> >> > 124 65526 528.42 1525134.71 >> >> > 125 66160 529.12 1527158.94 >> >> > 126 67214 533.44 1539955.47 >> >> > 127 67855 534.29 1541965.13 >> >> > 128 67939 528.79 1526270.55 >> >> > 129 68178 528.44 1525239.57 >> >> > 130 68811 529.24 1527681.84 >> >> > >> >> > Cuttlefish still wins, but by a much smaller margin now. Looks like >> >> > we're onto something. The fdatasync seems to be the key here, rather >> >> > than disabling wbthrottle. >> >> > >> >> > >> >> > Regards, >> >> > >> >> > Oliver >> >> > >> >> > On 23-08-13 19:53, Samuel Just wrote: >> >> >> I pushed a branch, wip-dumpling-perf. It does two things: >> >> >> 1) adds a config filestore_wbthrottle_enable (defaults to true) to >> >> >> allow you to disable the wbthrottle altogether >> >> >> 2) causes the wbthrottle when enabled to fdatasync rather than fsync. >> >> >> >> >> >> Can you rerun the random workload with that branch with >> >> >> filestore_wbthrottle_enable on and then off? >> >> >> -Sam >> >> >> >> >> >> On Fri, Aug 23, 2013 at 10:26 AM, Oliver Daudey <oliver@xxxxxxxxx> >> >> > wrote: >> >> >>> Hey Sage, >> >> >>> >> >> >>> I'm all for it and will help testing. >> >> >>> >> >> >>> >> >> >>> Regards, >> >> >>> >> >> >>> Oliver >> >> > >> >> >> > >> > >> > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com