Good to hear. -Sam On Tue, Aug 20, 2013 at 2:55 PM, Oliver Daudey <oliver@xxxxxxxxx> wrote: > Hey Samuel, > > I picked up 0.67.1-10-g47c8949 from the GIT-builder and the osd from > that seems to work great so far. I'll have to let it run for a while > longer to really be sure it fixed the problem, but it looks promising, > not taking any more CPU than the Cuttlefish-osds. Thanks! I'll get > back to you. > > > Regards, > > Oliver > > On di, 2013-08-20 at 10:40 -0700, Samuel Just wrote: >> Can you try dumpling head without the option? >> -Sam >> >> On Tue, Aug 20, 2013 at 1:44 AM, Oliver Daudey <oliver@xxxxxxxxx> wrote: >> > Hey Mark, >> > >> > Sorry, but after some more tests I have to report that it only worked >> > partly. The load seems lower with "wip-dumpling-pglog-undirty" in >> > place, but the Cuttlefish-osd still seems significantly faster and even >> > with "wip-dumpling-pglog-undirty" in place, things slow down way too >> > much under load. Unfortunately, only my production-cluster seems busy >> > enough to actually show the problem clearly by slowing down. Below is >> > `perf top'-output, fresh from my production-cluster under it's regular >> > load: >> > >> > First, the 0.67.1-6-g0c4f2f3 osd with "osd debug pg log writeout = >> > false": >> > 16.53% [kernel] [k] >> > intel_idle >> > 6.47% libleveldb.so.1.9 [.] >> > 0x380a1 >> > 5.76% [kernel] [k] >> > find_busiest_group >> > 4.11% libc-2.11.3.so [.] >> > memcmp >> > 3.95% kvm [.] >> > 0x1f6f31 >> > 2.05% [kernel] [k] >> > default_send_IPI_mask_sequence_phys >> > 2.03% [kernel] [k] >> > _raw_spin_lock >> > 1.87% libleveldb.so.1.9 [.] >> > leveldb::InternalKeyComparator::Compar >> > 1.57% libc-2.11.3.so [.] >> > memcpy >> > 1.37% libleveldb.so.1.9 [.] >> > leveldb::Block::Iter::Next() >> > 1.26% [kernel] [k] >> > hrtimer_interrupt >> > 1.12% [kernel] [k] >> > __hrtimer_start_range_ns >> > 1.09% [kernel] [k] >> > native_write_cr0 >> > 1.05% libstdc++.so.6.0.13 [.] >> > std::string::_M_mutate(unsigned long, >> > 1.00% [kernel] [k] >> > native_write_msr_safe >> > 0.99% [kernel] [k] >> > apic_timer_interrupt >> > 0.98% [kernel] [k] >> > clockevents_program_event >> > 0.96% [kernel] [k] >> > _raw_spin_unlock_irqrestore >> > 0.95% ceph-osd [.] >> > PGLog::undirty() >> > 0.92% [kernel] [k] >> > find_next_bit >> > 0.91% libsnappy.so.1.1.2 [.] >> > snappy::RawUncompress(snappy::Source*, >> > 0.88% [kernel] [k] >> > __schedule >> > 0.87% [kernel] [k] >> > cpumask_next_and >> > 0.84% [kernel] [k] >> > do_select >> > 0.80% [kernel] [k] >> > fget_light >> > 0.77% [kernel] [k] >> > reschedule_interrupt >> > 0.75% [kernel] [k] >> > _raw_spin_lock_irqsave >> > 0.62% libstdc++.so.6.0.13 [.] std::string::append(char >> > const*, unsig >> > 0.59% [kvm_intel] [k] >> > vmx_vcpu_run >> > 0.58% [kernel] [k] >> > copy_user_generic_string >> > 0.56% [kernel] [k] >> > load_balance >> > 0.54% [kernel] [k] >> > tg_load_down >> > 0.53% libpthread-2.11.3.so [.] >> > pthread_mutex_lock >> > 0.52% [kernel] [k] sync_inodes_sb >> > >> > Second, the 0.61.8 osd, under identical load: >> > 21.51% [kernel] [k] >> > intel_idle >> > 6.66% [kernel] [k] >> > find_busiest_group >> > 6.25% kvm [.] >> > 0x2d214b >> > 1.97% [kernel] [k] >> > _raw_spin_lock >> > 1.47% [kernel] [k] >> > native_write_msr_safe >> > 1.44% [kernel] [k] >> > hrtimer_interrupt >> > 1.37% [kernel] [k] >> > __hrtimer_start_range_ns >> > 1.34% [kernel] [k] >> > do_select >> > 1.29% [kernel] [k] >> > fget_light >> > 1.24% [kernel] [k] >> > clockevents_program_event >> > 1.21% [kernel] [k] >> > default_send_IPI_mask_sequence_phys >> > 1.18% [kernel] [k] >> > cpumask_next_and >> > 1.18% [kernel] [k] >> > _raw_spin_unlock_irqrestore >> > 1.15% [kernel] [k] >> > find_next_bit >> > 1.14% [kernel] [k] >> > __schedule >> > 1.11% [kernel] [k] >> > _raw_spin_lock_irqsave >> > 0.98% [kernel] [k] >> > apic_timer_interrupt >> > 0.86% [kernel] [k] >> > copy_user_generic_string >> > 0.83% [kernel] [k] >> > native_write_cr0 >> > 0.76% [kernel] [k] >> > sync_inodes_sb >> > 0.71% [kernel] [k] >> > rcu_needs_cpu >> > 0.69% libpthread-2.11.3.so [.] >> > pthread_mutex_lock >> > 0.66% [kernel] [k] >> > fput >> > 0.62% [kernel] [k] >> > load_balance >> > 0.57% [vdso] [.] >> > 0x7fff3a976700 >> > 0.56% libc-2.11.3.so [.] >> > memcpy >> > 0.56% [kernel] [k] >> > reschedule_interrupt >> > 0.56% [kernel] [k] >> > tg_load_down >> > 0.50% [kernel] [k] iput >> > >> > I see lots of new differences, but again don't know what to make of it >> > and what might be related or significant. LevelDB seems to jump out >> > this time, amongst others. Let me know if you need more info. >> > >> > >> > Regards, >> > >> > Oliver >> > >> > On ma, 2013-08-19 at 15:21 -0500, Mark Nelson wrote: >> >> Hi Oliver, >> >> >> >> Glad that helped! How much more efficient do the cuttlefish OSDs seem >> >> at this point (with wip-dumpling-pglog-undirty)? On modern Intel >> >> platforms we were actually hoping to see CPU usage go down in many cases >> >> due to the use of hardware CRC32 instructions. >> >> >> >> Mark >> >> >> >> On 08/19/2013 03:06 PM, Oliver Daudey wrote: >> >> > Hey Samuel, >> >> > >> >> > Thanks! I installed your version, repeated the same tests on my >> >> > test-cluster and the extra CPU-loading seems to have disappeared. Then >> >> > I replaced one osd of my production-cluster with your modified version >> >> > and it's config-option and it seems to be a lot less CPU-hungry now. >> >> > Although the Cuttlefish-osds still seem to be even more CPU-efficient, >> >> > your changes have definitely helped a lot. We seem to be looking in the >> >> > right direction, at least for this part of the problem. >> >> > >> >> > BTW, I ran `perf top' on the production-node with your modified osd and >> >> > didn't see anything osd-related stand out on top. "PGLog::undirty()" >> >> > was in there, but with much lower usage, right at the bottom of the >> >> > green part of the output. >> >> > >> >> > Many thanks for your help so far! >> >> > >> >> > >> >> > Regards, >> >> > >> >> > Oliver >> >> > >> >> > On ma, 2013-08-19 at 00:29 -0700, Samuel Just wrote: >> >> >> You're right, PGLog::undirty() looks suspicious. I just pushed a >> >> >> branch wip-dumpling-pglog-undirty with a new config >> >> >> (osd_debug_pg_log_writeout) which if set to false will disable some >> >> >> strictly debugging checks which occur in PGLog::undirty(). We haven't >> >> >> actually seen these checks causing excessive cpu usage, so this may be >> >> >> a red herring. >> >> >> -Sam >> >> >> >> >> >> On Sat, Aug 17, 2013 at 2:48 PM, Oliver Daudey <oliver@xxxxxxxxx> wrote: >> >> >>> Hey Mark, >> >> >>> >> >> >>> On za, 2013-08-17 at 08:16 -0500, Mark Nelson wrote: >> >> >>>> On 08/17/2013 06:13 AM, Oliver Daudey wrote: >> >> >>>>> Hey all, >> >> >>>>> >> >> >>>>> This is a copy of Bug #6040 (http://tracker.ceph.com/issues/6040) I >> >> >>>>> created in the tracker. Thought I would pass it through the list as >> >> >>>>> well, to get an idea if anyone else is running into it. It may only >> >> >>>>> show under higher loads. More info about my setup is in the bug-report >> >> >>>>> above. Here goes: >> >> >>>>> >> >> >>>>> >> >> >>>>> I'm running a Ceph-cluster with 3 nodes, each of which runs a mon, osd >> >> >>>>> and mds. I'm using RBD on this cluster as storage for KVM, CephFS is >> >> >>>>> unused at this time. While still on v0.61.7 Cuttlefish, I got 70-100 >> >> >>>>> +MB/sec on simple linear writes to a file with `dd' inside a VM on this >> >> >>>>> cluster under regular load and the osds usually averaged 20-100% >> >> >>>>> CPU-utilisation in `top'. After the upgrade to Dumpling, CPU-usage for >> >> >>>>> the osds shot up to 100% to 400% in `top' (multi-core system) and the >> >> >>>>> speed for my writes with `dd' inside a VM dropped to 20-40MB/sec. Users >> >> >>>>> complained that disk-access inside the VMs was significantly slower and >> >> >>>>> the backups of the RBD-store I was running, also got behind quickly. >> >> >>>>> >> >> >>>>> After downgrading only the osds to v0.61.7 Cuttlefish and leaving the >> >> >>>>> rest at 0.67 Dumpling, speed and load returned to normal. I have >> >> >>>>> repeated this performance-hit upon upgrade on a similar test-cluster >> >> >>>>> under no additional load at all. Although CPU-usage for the osds wasn't >> >> >>>>> as dramatic during these tests because there was no base-load from other >> >> >>>>> VMs, I/O-performance dropped significantly after upgrading during these >> >> >>>>> tests as well, and returned to normal after downgrading the osds. >> >> >>>>> >> >> >>>>> I'm not sure what to make of it. There are no visible errors in the logs >> >> >>>>> and everything runs and reports good health, it's just a lot slower, >> >> >>>>> with a lot more CPU-usage. >> >> >>>> >> >> >>>> Hi Oliver, >> >> >>>> >> >> >>>> If you have access to the perf command on this system, could you try >> >> >>>> running: >> >> >>>> >> >> >>>> "sudo perf top" >> >> >>>> >> >> >>>> And if that doesn't give you much, >> >> >>>> >> >> >>>> "sudo perf record -g" >> >> >>>> >> >> >>>> then: >> >> >>>> >> >> >>>> "sudo perf report | less" >> >> >>>> >> >> >>>> during the period of high CPU usage? This will give you a call graph. >> >> >>>> There may be symbols missing, but it might help track down what the OSDs >> >> >>>> are doing. >> >> >>> >> >> >>> Thanks for your help! I did a couple of runs on my test-cluster, >> >> >>> loading it with writes from 3 VMs concurrently and measuring the results >> >> >>> at the first node with all 0.67 Dumpling-components and with the osds >> >> >>> replaced by 0.61.7 Cuttlefish. I let `perf top' run and settle for a >> >> >>> while, then copied anything that showed in red and green into this post. >> >> >>> Here are the results (sorry for the word-wraps): >> >> >>> >> >> >>> First, with 0.61.7 osds: >> >> >>> >> >> >>> 19.91% [kernel] [k] intel_idle >> >> >>> 10.18% [kernel] [k] _raw_spin_lock_irqsave >> >> >>> 6.79% ceph-osd [.] ceph_crc32c_le >> >> >>> 4.93% [kernel] [k] >> >> >>> default_send_IPI_mask_sequence_phys >> >> >>> 2.71% [kernel] [k] copy_user_generic_string >> >> >>> 1.42% libc-2.11.3.so [.] memcpy >> >> >>> 1.23% [kernel] [k] find_busiest_group >> >> >>> 1.13% librados.so.2.0.0 [.] ceph_crc32c_le_intel >> >> >>> 1.11% [kernel] [k] _raw_spin_lock >> >> >>> 0.99% kvm [.] 0x1931f8 >> >> >>> 0.92% [igb] [k] igb_poll >> >> >>> 0.87% [kernel] [k] native_write_cr0 >> >> >>> 0.80% [kernel] [k] csum_partial >> >> >>> 0.78% [kernel] [k] __do_softirq >> >> >>> 0.63% [kernel] [k] hpet_legacy_next_event >> >> >>> 0.53% [ip_tables] [k] ipt_do_table >> >> >>> 0.50% libc-2.11.3.so [.] 0x74433 >> >> >>> >> >> >>> Second test, with 0.67 osds: >> >> >>> >> >> >>> 18.32% [kernel] [k] intel_idle >> >> >>> 7.58% [kernel] [k] _raw_spin_lock_irqsave >> >> >>> 7.04% ceph-osd [.] PGLog::undirty() >> >> >>> 4.39% ceph-osd [.] ceph_crc32c_le_intel >> >> >>> 3.92% [kernel] [k] >> >> >>> default_send_IPI_mask_sequence_phys >> >> >>> 2.25% [kernel] [k] copy_user_generic_string >> >> >>> 1.76% libc-2.11.3.so [.] memcpy >> >> >>> 1.56% librados.so.2.0.0 [.] ceph_crc32c_le_intel >> >> >>> 1.40% libc-2.11.3.so [.] vfprintf >> >> >>> 1.12% libc-2.11.3.so [.] 0x7217b >> >> >>> 1.05% [kernel] [k] _raw_spin_lock >> >> >>> 1.01% [kernel] [k] find_busiest_group >> >> >>> 0.83% kvm [.] 0x193ab8 >> >> >>> 0.80% [kernel] [k] native_write_cr0 >> >> >>> 0.76% [kernel] [k] __do_softirq >> >> >>> 0.73% libc-2.11.3.so [.] _IO_default_xsputn >> >> >>> 0.70% [kernel] [k] csum_partial >> >> >>> 0.68% [igb] [k] igb_poll >> >> >>> 0.58% [kernel] [k] hpet_legacy_next_event >> >> >>> 0.54% [kernel] [k] __schedule >> >> >>> >> >> >>> >> >> >>> What jumps right out, is the "PGLog::undirty()", which doesn't show up >> >> >>> with 0.61.7 at all, but is an extra drag right at top-usage in 0.67. >> >> >>> Note that I didn't manage to fully load the test-cluster CPU-wise, >> >> >>> because of network-constraints and I don't want to take any extra risks >> >> >>> on the production-cluster and test it there, but it seems we found a >> >> >>> possible culprit. >> >> >>> >> >> >>> Any ideas? Thanks again! >> >> >>> >> >> >>> >> >> >>> >> >> >>> Regards, >> >> >>> >> >> >>> Oliver >> >> >>> >> >> >>> _______________________________________________ >> >> >>> ceph-users mailing list >> >> >>> ceph-users@xxxxxxxxxxxxxx >> >> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> >> >> > >> >> > >> >> > _______________________________________________ >> >> > ceph-users mailing list >> >> > ceph-users@xxxxxxxxxxxxxx >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > >> >> >> >> _______________________________________________ >> >> ceph-users mailing list >> >> ceph-users@xxxxxxxxxxxxxx >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> > >> > >> > _______________________________________________ >> > ceph-users mailing list >> > ceph-users@xxxxxxxxxxxxxx >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com