Hey Samuel, Nope, "PGLog::undirty()" doesn't use as much CPU as before, but I found it curious that it still showed up, as I thought you disabled it. As long as you can reproduce the abnormal leveldb CPU-usage. Let me know if I can help with anything. Regards, Oliver On ma, 2013-08-26 at 14:23 -0700, Samuel Just wrote: > Saw your logs. I thought it might be enabling > filestore_xattr_use_omap, but it isn't. PGLog::undirty() doesn't seem > to be using very much cpu. > -Sam > > On Mon, Aug 26, 2013 at 2:04 PM, Oliver Daudey <oliver@xxxxxxxxx> wrote: > > Hey Samuel, > > > > I have been trying to get it reproduced on my test-cluster and seem to > > have found a way. Try: `rbd bench-write test --io-threads 80 > > --io-pattern=rand'. On my test-cluster, this closely replicates what I > > see during profiling on my production-cluster, including the extra > > CPU-usage by leveldb, which doesn't show up on Cuttlefish. It's very > > curious that "PGLog::undirty()" is also still showing up near the top, > > even in 0.67.2. > > > > I'll send you the logs by private mail. > > > > > > Regards, > > > > Oliver > > > > On ma, 2013-08-26 at 13:35 -0700, Samuel Just wrote: > >> Can you attach a log from the startup of one of the dumpling osds on > >> your production machine (no need for logging, just need some of the > >> information dumped on every boot)? > >> > >> libleveldb is leveldb. We've used leveldb for a few things since > >> bobtail. If anything, the load on leveldb should be lighter in > >> dumpling, I would think... I'll have to try to reproduce it locally. > >> I'll keep you posted. > >> -Sam > >> > >> On Sat, Aug 24, 2013 at 10:11 AM, Oliver Daudey <oliver@xxxxxxxxx> wrote: > >> > Hey Samuel, > >> > > >> > Unfortunately, disabling "wbthrottle" made almost no difference on my > >> > production-cluster. OSD-load was still much higher on Dumpling. > >> > > >> > I've mentioned this several times already, but when profiling with `perf > >> > top' on my production-cluster, any time I'm running a Dumpling-OSD, > >> > several "libleveldb"-related entries come up near the top, that don't > >> > show up when running the Cuttlefish-OSD at all. Let's concentrate on > >> > that for a moment, as it's a clearly visible difference on my > >> > production-cluster, which shows the actual problem. > >> > > >> > Dumpling OSDs: > >> > 17.23% [kernel] [k] intel_idle > >> > 6.35% [kernel] [k] find_busiest_group > >> > 4.36% kvm [.] 0x2cdbb0 > >> > 3.38% libleveldb.so.1.9 [.] 0x22821 > >> > 2.40% libc-2.11.3.so [.] memcmp > >> > 2.04% ceph-osd [.] ceph_crc32c_le_intel > >> > 1.90% [kernel] [k] _raw_spin_lock > >> > 1.87% [kernel] [k] copy_user_generic_string > >> > 1.35% [kernel] [k] > >> > default_send_IPI_mask_sequence_phys > >> > 1.34% [kernel] [k] __hrtimer_start_range_ns > >> > 1.14% libc-2.11.3.so [.] memcpy > >> > 1.03% [kernel] [k] hrtimer_interrupt > >> > 1.01% [kernel] [k] do_select > >> > 1.00% [kernel] [k] __schedule > >> > 0.99% [kernel] [k] _raw_spin_unlock_irqrestore > >> > 0.97% [kernel] [k] cpumask_next_and > >> > 0.97% [kernel] [k] find_next_bit > >> > 0.96% libleveldb.so.1.9 [.] > >> > leveldb::InternalKeyComparator::Compar > >> > 0.91% [kernel] [k] _raw_spin_lock_irqsave > >> > 0.91% [kernel] [k] fget_light > >> > 0.89% [kernel] [k] clockevents_program_event > >> > 0.79% [kernel] [k] sync_inodes_sb > >> > 0.78% libleveldb.so.1.9 [.] leveldb::Block::Iter::Next() > >> > 0.75% [kernel] [k] apic_timer_interrupt > >> > 0.70% [kernel] [k] native_write_cr0 > >> > 0.60% [kvm_intel] [k] vmx_vcpu_run > >> > 0.58% [kernel] [k] load_balance > >> > 0.57% [kernel] [k] rcu_needs_cpu > >> > 0.56% ceph-osd [.] PGLog::undirty() > >> > 0.51% libpthread-2.11.3.so [.] pthread_mutex_lock > >> > 0.50% [vdso] [.] 0x7fff6dbff6ce > >> > > >> > Same load, but with Cuttlefish-OSDs: > >> > 19.23% [kernel] [k] intel_idle > >> > 6.43% [kernel] [k] find_busiest_group > >> > 5.25% kvm [.] 0x152a75 > >> > 2.70% ceph-osd [.] ceph_crc32c_le > >> > 2.44% [kernel] [k] _raw_spin_lock > >> > 1.95% [kernel] [k] copy_user_generic_string > >> > 1.53% [kernel] [k] > >> > default_send_IPI_mask_sequence_phys > >> > 1.28% [kernel] [k] __hrtimer_start_range_ns > >> > 1.21% [kernel] [k] do_select > >> > 1.19% [kernel] [k] hrtimer_interrupt > >> > 1.19% [kernel] [k] _raw_spin_unlock_irqrestore > >> > 1.16% [kernel] [k] fget_light > >> > 1.12% [kernel] [k] cpumask_next_and > >> > 1.11% [kernel] [k] clockevents_program_event > >> > 1.08% [kernel] [k] __schedule > >> > 1.08% [kernel] [k] find_next_bit > >> > 0.99% [kernel] [k] _raw_spin_lock_irqsave > >> > 0.90% [kernel] [k] native_write_cr0 > >> > 0.83% [kernel] [k] native_write_msr_safe > >> > 0.82% [kernel] [k] apic_timer_interrupt > >> > 0.70% libc-2.11.3.so [.] memcpy > >> > 0.68% [kernel] [k] sync_inodes_sb > >> > 0.63% [kernel] [k] tg_load_down > >> > 0.63% [kernel] [k] load_balance > >> > 0.61% libpthread-2.11.3.so [.] pthread_mutex_lock > >> > 0.58% [kernel] [k] rcu_needs_cpu > >> > 0.57% [kernel] [k] fput > >> > 0.56% libc-2.11.3.so [.] 0x7fb29 > >> > 0.54% [vdso] [.] 0x7fff2afb873a > >> > 0.50% [kernel] [k] iput > >> > 0.50% [kernel] [k] reschedule_interrupt > >> > > >> > It seems to me like "libleveldb" is accounting for significant extra > >> > CPU-loading on Dumpling. Another interesting fact, is that I only see > >> > it use so much CPU on my production-cluster, which is serving around 80 > >> > KVMs over RBD, over 2x10Gbit/s ethernet per node and so has a much > >> > higher and more diverse load than what I can generate on my > >> > test-cluster. > >> > > >> > Any ideas about this particular one? Did anything change between > >> > Cuttlefish and Dumpling that would account for higher CPU-usage by > >> > "libleveldb", particularly under this client-load? Anything I can try > >> > to bring it down? Thanks! > >> > > >> > > >> > Regards, > >> > > >> > Oliver > >> > > >> > On vr, 2013-08-23 at 13:55 -0700, Samuel Just wrote: > >> >> Ok, can you try setting filestore_op_threads to 1 on both cuttlefish > >> >> and wip-dumpling-perf (with and with wbthrottle, default wbthrottle > >> >> settings). I suspect I created contention in the filestore op threads > >> >> (FileStore::lfn_open specifically), and if so setting it to only use 1 > >> >> thread should even out the performance. > >> >> -Sam > >> >> > >> >> On Fri, Aug 23, 2013 at 1:44 PM, Oliver Daudey <oliver@xxxxxxxxx> wrote: > >> >> > Hey Samuel, > >> >> > > >> >> > I commented the earlier settings out, so it was with defaults. > >> >> > > >> >> > > >> >> > Regards, > >> >> > > >> >> > Oliver > >> >> > > >> >> > On vr, 2013-08-23 at 13:35 -0700, Samuel Just wrote: > >> >> >> When you were running with the wbthrottle on, did you have the > >> >> >> settings I gave you earlier set, or was it using the defaults? > >> >> >> -Sam > >> >> >> > >> >> >> On Fri, Aug 23, 2013 at 12:48 PM, Oliver Daudey <oliver@xxxxxxxxx> wrote: > >> >> >> > Hey Samuel, > >> >> >> > > >> >> >> > That changed something, for the better. :-) > >> >> >> > > >> >> >> > Your test-version, with wbthrottle off: > >> >> >> > # ceph-osd --version > >> >> >> > ceph version 0.67.1-18-g3fe3368 > >> >> >> > (3fe3368ac7178dcd312e89d264d8d81307e582d8) > >> >> >> > # ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok config show | grep > >> >> >> > wbthrottle_enable > >> >> >> > "filestore_wbthrottle_enable": "false", > >> >> >> > # rbd bench-write test --io-pattern=rand > >> >> >> > bench-write io_size 4096 io_threads 16 bytes 1073741824 pattern rand > >> >> >> > SEC OPS OPS/SEC BYTES/SEC > >> >> >> > 1 838 837.36 2283825.44 > >> >> >> > 2 1679 827.66 2309860.21 > >> >> >> > 3 2488 829.30 2320967.64 > >> >> >> > 4 2812 655.55 1847713.58 > >> >> >> > 5 3301 659.64 1849006.69 > >> >> >> > <...> > >> >> >> > 120 54113 450.83 1271297.14 > >> >> >> > 121 54847 453.13 1277957.21 > >> >> >> > 122 55502 454.93 1281769.31 > >> >> >> > 123 55768 451.09 1271512.03 > >> >> >> > 124 55853 450.42 1269452.15 > >> >> >> > 126 55889 442.22 1246494.89 > >> >> >> > 127 56351 443.71 1250308.66 > >> >> >> > 128 57135 446.36 1257746.91 > >> >> >> > 129 57445 444.03 1251566.31 > >> >> >> > 130 57944 445.72 1256426.99 > >> >> >> > > >> >> >> > Now with the wbthrottle-option enabled: > >> >> >> > # ceph-osd --version > >> >> >> > ceph version 0.67.1-18-g3fe3368 > >> >> >> > (3fe3368ac7178dcd312e89d264d8d81307e582d8) > >> >> >> > # ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok config show | grep > >> >> >> > wbthrottle_enable > >> >> >> > "filestore_wbthrottle_enable": "true", > >> >> >> > # rbd bench-write test --io-pattern=randbench-write io_size 4096 > >> >> >> > io_threads 16 bytes 1073741824 pattern rand > >> >> >> > SEC OPS OPS/SEC BYTES/SEC > >> >> >> > 1 822 821.23 2303875.45 > >> >> >> > 2 1510 738.54 2021390.37 > >> >> >> > 3 2224 740.29 1989216.54 > >> >> >> > 4 2843 708.01 1912598.24 > >> >> >> > 5 3294 652.35 1774048.26 > >> >> >> > <...> > >> >> >> > 120 59470 495.58 1388776.41 > >> >> >> > 122 60067 489.29 1370954.96 > >> >> >> > 123 60083 487.74 1366434.49 > >> >> >> > 124 60449 487.49 1366477.77 > >> >> >> > 125 61122 488.97 1370480.73 > >> >> >> > 126 61679 489.52 1372290.03 > >> >> >> > 127 62195 489.69 1372523.64 > >> >> >> > 128 62608 489.02 1370226.98 > >> >> >> > 129 62655 485.68 1360719.66 > >> >> >> > 130 62688 482.01 1350560.76 > >> >> >> > > >> >> >> > Slightly faster than with wbthrottle disabled.. For comparison, one > >> >> >> > more time with Cuttlefish-OSDs: > >> >> >> > # ceph-osd --version > >> >> >> > ceph version 0.61.7 (8f010aff684e820ecc837c25ac77c7a05d7191ff) > >> >> >> > # rbd bench-write test --io-pattern=rand > >> >> >> > bench-write io_size 4096 io_threads 16 bytes 1073741824 pattern rand > >> >> >> > SEC OPS OPS/SEC BYTES/SEC > >> >> >> > 1 1074 1073.80 3079631.51 > >> >> >> > 2 2121 1060.21 3001542.58 > >> >> >> > 3 3111 845.59 2422586.94 > >> >> >> > 4 3132 745.40 2132915.48 > >> >> >> > 5 3336 665.33 1904205.15 > >> >> >> > <...> > >> >> >> > 120 63700 530.81 1530767.77 > >> >> >> > 121 64781 535.37 1544861.37 > >> >> >> > 122 64944 532.10 1535398.54 > >> >> >> > 123 65312 527.70 1522971.49 > >> >> >> > 124 65526 528.42 1525134.71 > >> >> >> > 125 66160 529.12 1527158.94 > >> >> >> > 126 67214 533.44 1539955.47 > >> >> >> > 127 67855 534.29 1541965.13 > >> >> >> > 128 67939 528.79 1526270.55 > >> >> >> > 129 68178 528.44 1525239.57 > >> >> >> > 130 68811 529.24 1527681.84 > >> >> >> > > >> >> >> > Cuttlefish still wins, but by a much smaller margin now. Looks like > >> >> >> > we're onto something. The fdatasync seems to be the key here, rather > >> >> >> > than disabling wbthrottle. > >> >> >> > > >> >> >> > > >> >> >> > Regards, > >> >> >> > > >> >> >> > Oliver > >> >> >> > > >> >> >> > On 23-08-13 19:53, Samuel Just wrote: > >> >> >> >> I pushed a branch, wip-dumpling-perf. It does two things: > >> >> >> >> 1) adds a config filestore_wbthrottle_enable (defaults to true) to > >> >> >> >> allow you to disable the wbthrottle altogether > >> >> >> >> 2) causes the wbthrottle when enabled to fdatasync rather than fsync. > >> >> >> >> > >> >> >> >> Can you rerun the random workload with that branch with > >> >> >> >> filestore_wbthrottle_enable on and then off? > >> >> >> >> -Sam > >> >> >> >> > >> >> >> >> On Fri, Aug 23, 2013 at 10:26 AM, Oliver Daudey <oliver@xxxxxxxxx> > >> >> >> > wrote: > >> >> >> >>> Hey Sage, > >> >> >> >>> > >> >> >> >>> I'm all for it and will help testing. > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> Regards, > >> >> >> >>> > >> >> >> >>> Oliver > >> >> >> > > >> >> >> > >> >> > > >> >> > > >> >> > >> > > >> > > >> > > > > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com