Re: Significant slowdown of osds since v0.67 Dumpling

Samuel Just <sam.just@xxxxxxxxxxx> · Wed, 21 Aug 2013 18:33:42 -0700

I am dumb.  There *has* been a change in the osd which can account for
this: the wbthrottle limits.  We added some logic to force the kernel
to start flushing writes out earlier, normally a good thing.  In this
case, it's probably doing an fsync every 500 writes.

Can you run 3 tests?
1) rerun with --rbd-cache (rbd bench-write test --rbd-cache)
2) rerun without rbd-cache, but add the following configs to the osds:
filestore_wbthrottle_xfs_ios_start_flusher = 10000
filestore_wbthrottle_xfs_ios_hard_limit = 10000
filestore_wbthrottle_btrfs_ios_start_flusher = 10000
filestore_wbthrottle_btrfs_ios_hard_limit = 10000
3) rerun with the random pattern (rbd bench-write test --io-pattern=rand)

I expect 1) to largely eliminate the difference (these are sequential
4k ios and should be coalesced into much bigger writes).  2) should
basically restore the same filestore behavior as in cuttlefish.  3)
should also show little difference since the io will be random.

In normal use, your vms should probably be using rbd caching anyway
(in addition to the write coalescing already occurring in the guest
os), so this may not be what you are seeing on your cluster.
-Sam

On Wed, Aug 21, 2013 at 6:00 PM, Samuel Just <sam.just@xxxxxxxxxxx> wrote:
> You'd need to unmount the fs to actually clear the cache.  Did you see
> a significant difference in load between the runs?  To confirm, the
> rbd client is dumpling the entire time?
> -Sam
>
> On Wed, Aug 21, 2013 at 2:28 PM, Oliver Daudey <oliver@xxxxxxxxx> wrote:
>> Hey Samuel,
>>
>> I repeated the same test several times before my post and just now 2
>> more times.  It holds up and is also repeatable in reverse order, with
>> the same results.  Remember, I restart all OSDs between tests, so any
>> caches should get destroyed and besides, I'm writing.  That shouldn't
>> involve very large-scale caching, right?  I also waited for the cluster
>> to come to a healthy state after restarting the OSDs, so it's not
>> related to rebalancing or peering-activity, either.
>>
>>
>>    Regards,
>>
>>       Oliver
>>
>> On wo, 2013-08-21 at 14:07 -0700, Samuel Just wrote:
>>> Try it again in the reverse order, I strongly suspect caching effects.
>>> -Sam
>>>
>>> On Wed, Aug 21, 2013 at 1:34 PM, Oliver Daudey <oliver@xxxxxxxxx> wrote:
>>> > Hey Samuel,
>>> >
>>> > Finally got it reproduced on my test-cluster, which was otherwise
>>> > unloaded at the time.  First, with Dumpling:
>>> >
>>> > # rbd create --size 102400 test
>>> > # ceph-osd --version
>>> > ceph version 0.67.1-6-g0c4f2f3
>>> > (0c4f2f34b78b634efe7f4d56694e2edeeda5a130)
>>> > # rbd bench-write test
>>> > bench-write  io_size 4096 io_threads 16 bytes 1073741824 pattern seq
>>> >   SEC       OPS   OPS/SEC   BYTES/SEC
>>> >     1       204    200.67  592272.58
>>> >     2       401    200.24  607462.05
>>> >     3       625    204.62  639655.53
>>> >     4       824    205.20  636488.76
>>> >     5      1026    204.08  630587.77
>>> >     6      1217    202.69  628965.39
>>> >     7      1429    203.19  633671.84
>>> >     8      1626    202.93  635419.90
>>> >     9      1837    203.11  630421.40
>>> >    10      2045    204.16  631364.11
>>> >    11      2247    204.12  625833.91
>>> >    12      2456    204.29  629952.57
>>> >    13      2655    204.15  634639.37
>>> >    14      2888    206.15  637373.53
>>> >    15      3115    207.12  640826.18
>>> >    16      3330    207.75  640120.38
>>> >    17      3529    207.57  636277.18
>>> >    18      3750    208.24  635036.47
>>> >    19      3965    208.65  631118.13
>>> >    20      4216    210.75  638197.93
>>> >    21      4534    215.83  656890.46
>>> >    22      4809    218.50  666612.75
>>> >    23      5076    220.63  678678.36
>>> >    24      5320    221.39  686760.90
>>> >    25      5651    226.02  701345.26
>>> >    26      5967    229.44  706839.16
>>> >    27      6271    232.05  707958.62
>>> >    28      6522    232.82  709890.37
>>> >    29      6773    233.37  711718.21
>>> >    30      7151    238.23  722924.18
>>> >
>>> > Then I replaced the OSD with Cuttlefish on all nodes and restarted it
>>> > everywhere.
>>> >
>>> > # ceph-osd --version
>>> > ceph version 0.61.7 (8f010aff684e820ecc837c25ac77c7a05d7191ff)
>>> > # rbd bench-write test
>>> > bench-write  io_size 4096 io_threads 16 bytes 1073741824 pattern seq
>>> >   SEC       OPS   OPS/SEC   BYTES/SEC
>>> >     1       556    555.63  1915665.58
>>> >     2      1095    545.08  1839139.42
>>> >     3      1696    564.45  1840316.22
>>> >     4      2312    577.25  1832641.60
>>> >     5      2870    573.46  1800558.64
>>> >     6      3461    576.81  1780993.33
>>> >     7      4062    579.26  1821246.36
>>> >     8      4678    581.51  1862003.10
>>> >     9      5269    585.12  1902219.78
>>> >    10      5883    587.11  1914700.43
>>> >    11      6471    587.59  1913208.96
>>> >    12      7022    585.12  1914375.79
>>> >    13      7568    580.15  1899650.34
>>> >    14      8122    579.87  1864572.53
>>> >    15      8659    576.18  1826927.83
>>> >    16      9259    576.81  1796653.40
>>> >    17      9826    577.60  1785328.57
>>> >    18     10401    576.30  1777479.51
>>> >    19     10962    576.94  1777431.02
>>> >    20     11399    569.84  1756633.76
>>> >    21     11937    567.45  1740147.43
>>> >    22     12491    567.47  1726288.46
>>> >    23     13109    569.92  1727347.20
>>> >    24     13629    567.64  1721996.79
>>> >    25     14153    566.11  1720782.06
>>> >    26     14669    563.23  1716275.34
>>> >    27     15191    562.61  1717677.74
>>> >    28     15768    562.49  1713059.35
>>> >    29     16281    561.40  1707289.23
>>> >    31     16719    536.96  1628452.43
>>> > ^C
>>> >
>>> > As you can see, the difference is quite dramatic and this confirms it's
>>> > probably something RBD-related.  The test-cluster is a standard 3-node
>>> > cluster with 3xMON, 3xMDS, 3xOSD, tied together with 2x1Gbit/s ethernet.
>>> > I replaced only the OSD-binary for these tests.  The rest of the
>>> > Ceph-installation is at Dumpling.
>>> >
>>> >
>>> >    Regards,
>>> >
>>> >      Oliver
>>> >
>>> > On wo, 2013-08-21 at 12:05 -0700, Samuel Just wrote:
>>> >> There haven't been any significant osd side changes that I can think
>>> >> of.  Is cpu usage still high?  If so, can you post the profiler
>>> >> results again?
>>> >> -Sam
>>> >>
>>> >> On Wed, Aug 21, 2013 at 12:02 PM, Oliver Daudey <oliver@xxxxxxxxx> wrote:
>>> >> > Hey Samuel,
>>> >> >
>>> >> > I had a good run on the production-cluster with it and unfortunately, it
>>> >> > still doesn't seem to have solved the problem.  It seemed OK for a while
>>> >> > and individual OSD CPU-usage seemed quite low, but as the cluster's load
>>> >> > increased during the day, things got slower again.  Write-performance
>>> >> > within a VM crawled to 30MB/sec and at some point, I got only 10MB/sec
>>> >> > on reads in that same VM.  I also did RADOS bench-tests with `rados
>>> >> > --pool rbd bench 120 write' and those got several hundreds of MB's/sec
>>> >> > on the same cluster at the same time of day, so maybe the problem is
>>> >> > RBD-related.  Is there any code in the OSD that could influence
>>> >> > RBD-performance alone?  Do you know of any other significant changes to
>>> >> > the OSD between Cuttlefish and Dumpling that could result in this?
>>> >> >
>>> >> > PS: I also did the same RADOS bench-tests on my test-cluster, both with
>>> >> > Cuttlefish and Dumpling without your fix and got almost identical
>>> >> > results.  This confirms that the problem might be in RBD, as Mark suggested.
>>> >> >
>>> >> >
>>> >> >     Regards,
>>> >> >
>>> >> >        Oliver
>>> >> >
>>> >> > On 20-08-13 19:40, Samuel Just wrote:
>>> >> >> Can you try dumpling head without the option?
>>> >> >> -Sam
>>> >> >>
>>> >> >> On Tue, Aug 20, 2013 at 1:44 AM, Oliver Daudey <oliver@xxxxxxxxx> wrote:
>>> >> >>> Hey Mark,
>>> >> >>>
>>> >> >>> Sorry, but after some more tests I have to report that it only worked
>>> >> >>> partly.  The load seems lower with "wip-dumpling-pglog-undirty" in
>>> >> >>> place, but the Cuttlefish-osd still seems significantly faster and even
>>> >> >>> with "wip-dumpling-pglog-undirty" in place, things slow down way too
>>> >> >>> much under load.  Unfortunately, only my production-cluster seems busy
>>> >> >>> enough to actually show the problem clearly by slowing down.  Below is
>>> >> >>> `perf top'-output, fresh from my production-cluster under it's regular
>>> >> >>> load:
>>> >> >>>
>>> >> >>> First, the 0.67.1-6-g0c4f2f3 osd with "osd debug pg log writeout =
>>> >> >>> false":
>>> >> >>>  16.53%  [kernel]                     [k]
>>> >> >>> intel_idle
>>> >> >>>   6.47%  libleveldb.so.1.9            [.]
>>> >> >>> 0x380a1
>>> >> >>>   5.76%  [kernel]                     [k]
>>> >> >>> find_busiest_group
>>> >> >>>   4.11%  libc-2.11.3.so               [.]
>>> >> >>> memcmp
>>> >> >>>   3.95%  kvm                          [.]
>>> >> >>> 0x1f6f31
>>> >> >>>   2.05%  [kernel]                     [k]
>>> >> >>> default_send_IPI_mask_sequence_phys
>>> >> >>>   2.03%  [kernel]                     [k]
>>> >> >>> _raw_spin_lock
>>> >> >>>   1.87%  libleveldb.so.1.9            [.]
>>> >> >>> leveldb::InternalKeyComparator::Compar
>>> >> >>>   1.57%  libc-2.11.3.so               [.]
>>> >> >>> memcpy
>>> >> >>>   1.37%  libleveldb.so.1.9            [.]
>>> >> >>> leveldb::Block::Iter::Next()
>>> >> >>>   1.26%  [kernel]                     [k]
>>> >> >>> hrtimer_interrupt
>>> >> >>>   1.12%  [kernel]                     [k]
>>> >> >>> __hrtimer_start_range_ns
>>> >> >>>   1.09%  [kernel]                     [k]
>>> >> >>> native_write_cr0
>>> >> >>>   1.05%  libstdc++.so.6.0.13          [.]
>>> >> >>> std::string::_M_mutate(unsigned long,
>>> >> >>>   1.00%  [kernel]                     [k]
>>> >> >>> native_write_msr_safe
>>> >> >>>   0.99%  [kernel]                     [k]
>>> >> >>> apic_timer_interrupt
>>> >> >>>   0.98%  [kernel]                     [k]
>>> >> >>> clockevents_program_event
>>> >> >>>   0.96%  [kernel]                     [k]
>>> >> >>> _raw_spin_unlock_irqrestore
>>> >> >>>   0.95%  ceph-osd                     [.]
>>> >> >>> PGLog::undirty()
>>> >> >>>   0.92%  [kernel]                     [k]
>>> >> >>> find_next_bit
>>> >> >>>   0.91%  libsnappy.so.1.1.2           [.]
>>> >> >>> snappy::RawUncompress(snappy::Source*,
>>> >> >>>   0.88%  [kernel]                     [k]
>>> >> >>> __schedule
>>> >> >>>   0.87%  [kernel]                     [k]
>>> >> >>> cpumask_next_and
>>> >> >>>   0.84%  [kernel]                     [k]
>>> >> >>> do_select
>>> >> >>>   0.80%  [kernel]                     [k]
>>> >> >>> fget_light
>>> >> >>>   0.77%  [kernel]                     [k]
>>> >> >>> reschedule_interrupt
>>> >> >>>   0.75%  [kernel]                     [k]
>>> >> >>> _raw_spin_lock_irqsave
>>> >> >>>   0.62%  libstdc++.so.6.0.13          [.] std::string::append(char
>>> >> >>> const*, unsig
>>> >> >>>   0.59%  [kvm_intel]                  [k]
>>> >> >>> vmx_vcpu_run
>>> >> >>>   0.58%  [kernel]                     [k]
>>> >> >>> copy_user_generic_string
>>> >> >>>   0.56%  [kernel]                     [k]
>>> >> >>> load_balance
>>> >> >>>   0.54%  [kernel]                     [k]
>>> >> >>> tg_load_down
>>> >> >>>   0.53%  libpthread-2.11.3.so         [.]
>>> >> >>> pthread_mutex_lock
>>> >> >>>   0.52%  [kernel]                     [k] sync_inodes_sb
>>> >> >>>
>>> >> >>> Second, the 0.61.8 osd, under identical load:
>>> >> >>>  21.51%  [kernel]                     [k]
>>> >> >>> intel_idle
>>> >> >>>   6.66%  [kernel]                     [k]
>>> >> >>> find_busiest_group
>>> >> >>>   6.25%  kvm                          [.]
>>> >> >>> 0x2d214b
>>> >> >>>   1.97%  [kernel]                     [k]
>>> >> >>> _raw_spin_lock
>>> >> >>>   1.47%  [kernel]                     [k]
>>> >> >>> native_write_msr_safe
>>> >> >>>   1.44%  [kernel]                     [k]
>>> >> >>> hrtimer_interrupt
>>> >> >>>   1.37%  [kernel]                     [k]
>>> >> >>> __hrtimer_start_range_ns
>>> >> >>>   1.34%  [kernel]                     [k]
>>> >> >>> do_select
>>> >> >>>   1.29%  [kernel]                     [k]
>>> >> >>> fget_light
>>> >> >>>   1.24%  [kernel]                     [k]
>>> >> >>> clockevents_program_event
>>> >> >>>   1.21%  [kernel]                     [k]
>>> >> >>> default_send_IPI_mask_sequence_phys
>>> >> >>>   1.18%  [kernel]                     [k]
>>> >> >>> cpumask_next_and
>>> >> >>>   1.18%  [kernel]                     [k]
>>> >> >>> _raw_spin_unlock_irqrestore
>>> >> >>>   1.15%  [kernel]                     [k]
>>> >> >>> find_next_bit
>>> >> >>>   1.14%  [kernel]                     [k]
>>> >> >>> __schedule
>>> >> >>>   1.11%  [kernel]                     [k]
>>> >> >>> _raw_spin_lock_irqsave
>>> >> >>>   0.98%  [kernel]                     [k]
>>> >> >>> apic_timer_interrupt
>>> >> >>>   0.86%  [kernel]                     [k]
>>> >> >>> copy_user_generic_string
>>> >> >>>   0.83%  [kernel]                     [k]
>>> >> >>> native_write_cr0
>>> >> >>>   0.76%  [kernel]                     [k]
>>> >> >>> sync_inodes_sb
>>> >> >>>   0.71%  [kernel]                     [k]
>>> >> >>> rcu_needs_cpu
>>> >> >>>   0.69%  libpthread-2.11.3.so         [.]
>>> >> >>> pthread_mutex_lock
>>> >> >>>   0.66%  [kernel]                     [k]
>>> >> >>> fput
>>> >> >>>   0.62%  [kernel]                     [k]
>>> >> >>> load_balance
>>> >> >>>   0.57%  [vdso]                       [.]
>>> >> >>> 0x7fff3a976700
>>> >> >>>   0.56%  libc-2.11.3.so               [.]
>>> >> >>> memcpy
>>> >> >>>   0.56%  [kernel]                     [k]
>>> >> >>> reschedule_interrupt
>>> >> >>>   0.56%  [kernel]                     [k]
>>> >> >>> tg_load_down
>>> >> >>>   0.50%  [kernel]                     [k] iput
>>> >> >>>
>>> >> >>> I see lots of new differences, but again don't know what to make of it
>>> >> >>> and what might be related or significant.  LevelDB seems to jump out
>>> >> >>> this time, amongst others.  Let me know if you need more info.
>>> >> >>>
>>> >> >>>
>>> >> >>>    Regards,
>>> >> >>>
>>> >> >>>      Oliver
>>> >> >>>
>>> >> >>> On ma, 2013-08-19 at 15:21 -0500, Mark Nelson wrote:
>>> >> >>>> Hi Oliver,
>>> >> >>>>
>>> >> >>>> Glad that helped!  How much more efficient do the cuttlefish OSDs seem
>>> >> >>>> at this point (with wip-dumpling-pglog-undirty)?  On modern Intel
>>> >> >>>> platforms we were actually hoping to see CPU usage go down in many cases
>>> >> >>>> due to the use of hardware CRC32 instructions.
>>> >> >>>>
>>> >> >>>> Mark
>>> >> >>>>
>>> >> >>>> On 08/19/2013 03:06 PM, Oliver Daudey wrote:
>>> >> >>>>> Hey Samuel,
>>> >> >>>>>
>>> >> >>>>> Thanks!  I installed your version, repeated the same tests on my
>>> >> >>>>> test-cluster and the extra CPU-loading seems to have disappeared.  Then
>>> >> >>>>> I replaced one osd of my production-cluster with your modified version
>>> >> >>>>> and it's config-option and it seems to be a lot less CPU-hungry now.
>>> >> >>>>> Although the Cuttlefish-osds still seem to be even more CPU-efficient,
>>> >> >>>>> your changes have definitely helped a lot.  We seem to be looking in the
>>> >> >>>>> right direction, at least for this part of the problem.
>>> >> >>>>>
>>> >> >>>>> BTW, I ran `perf top' on the production-node with your modified osd and
>>> >> >>>>> didn't see anything osd-related stand out on top.  "PGLog::undirty()"
>>> >> >>>>> was in there, but with much lower usage, right at the bottom of the
>>> >> >>>>> green part of the output.
>>> >> >>>>>
>>> >> >>>>> Many thanks for your help so far!
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>>     Regards,
>>> >> >>>>>
>>> >> >>>>>       Oliver
>>> >> >>>>>
>>> >> >>>>> On ma, 2013-08-19 at 00:29 -0700, Samuel Just wrote:
>>> >> >>>>>> You're right, PGLog::undirty() looks suspicious.  I just pushed a
>>> >> >>>>>> branch wip-dumpling-pglog-undirty with a new config
>>> >> >>>>>> (osd_debug_pg_log_writeout) which if set to false will disable some
>>> >> >>>>>> strictly debugging checks which occur in PGLog::undirty().  We haven't
>>> >> >>>>>> actually seen these checks causing excessive cpu usage, so this may be
>>> >> >>>>>> a red herring.
>>> >> >>>>>> -Sam
>>> >> >>>>>>
>>> >> >>>>>> On Sat, Aug 17, 2013 at 2:48 PM, Oliver Daudey <oliver@xxxxxxxxx> wrote:
>>> >> >>>>>>> Hey Mark,
>>> >> >>>>>>>
>>> >> >>>>>>> On za, 2013-08-17 at 08:16 -0500, Mark Nelson wrote:
>>> >> >>>>>>>> On 08/17/2013 06:13 AM, Oliver Daudey wrote:
>>> >> >>>>>>>>> Hey all,
>>> >> >>>>>>>>>
>>> >> >>>>>>>>> This is a copy of Bug #6040 (http://tracker.ceph.com/issues/6040) I
>>> >> >>>>>>>>> created in the tracker.  Thought I would pass it through the list as
>>> >> >>>>>>>>> well, to get an idea if anyone else is running into it.  It may only
>>> >> >>>>>>>>> show under higher loads.  More info about my setup is in the bug-report
>>> >> >>>>>>>>> above.  Here goes:
>>> >> >>>>>>>>>
>>> >> >>>>>>>>>
>>> >> >>>>>>>>> I'm running a Ceph-cluster with 3 nodes, each of which runs a mon, osd
>>> >> >>>>>>>>> and mds. I'm using RBD on this cluster as storage for KVM, CephFS is
>>> >> >>>>>>>>> unused at this time. While still on v0.61.7 Cuttlefish, I got 70-100
>>> >> >>>>>>>>> +MB/sec on simple linear writes to a file with `dd' inside a VM on this
>>> >> >>>>>>>>> cluster under regular load and the osds usually averaged 20-100%
>>> >> >>>>>>>>> CPU-utilisation in `top'. After the upgrade to Dumpling, CPU-usage for
>>> >> >>>>>>>>> the osds shot up to 100% to 400% in `top' (multi-core system) and the
>>> >> >>>>>>>>> speed for my writes with `dd' inside a VM dropped to 20-40MB/sec. Users
>>> >> >>>>>>>>> complained that disk-access inside the VMs was significantly slower and
>>> >> >>>>>>>>> the backups of the RBD-store I was running, also got behind quickly.
>>> >> >>>>>>>>>
>>> >> >>>>>>>>> After downgrading only the osds to v0.61.7 Cuttlefish and leaving the
>>> >> >>>>>>>>> rest at 0.67 Dumpling, speed and load returned to normal. I have
>>> >> >>>>>>>>> repeated this performance-hit upon upgrade on a similar test-cluster
>>> >> >>>>>>>>> under no additional load at all. Although CPU-usage for the osds wasn't
>>> >> >>>>>>>>> as dramatic during these tests because there was no base-load from other
>>> >> >>>>>>>>> VMs, I/O-performance dropped significantly after upgrading during these
>>> >> >>>>>>>>> tests as well, and returned to normal after downgrading the osds.
>>> >> >>>>>>>>>
>>> >> >>>>>>>>> I'm not sure what to make of it. There are no visible errors in the logs
>>> >> >>>>>>>>> and everything runs and reports good health, it's just a lot slower,
>>> >> >>>>>>>>> with a lot more CPU-usage.
>>> >> >>>>>>>>
>>> >> >>>>>>>> Hi Oliver,
>>> >> >>>>>>>>
>>> >> >>>>>>>> If you have access to the perf command on this system, could you try
>>> >> >>>>>>>> running:
>>> >> >>>>>>>>
>>> >> >>>>>>>> "sudo perf top"
>>> >> >>>>>>>>
>>> >> >>>>>>>> And if that doesn't give you much,
>>> >> >>>>>>>>
>>> >> >>>>>>>> "sudo perf record -g"
>>> >> >>>>>>>>
>>> >> >>>>>>>> then:
>>> >> >>>>>>>>
>>> >> >>>>>>>> "sudo perf report | less"
>>> >> >>>>>>>>
>>> >> >>>>>>>> during the period of high CPU usage?  This will give you a call graph.
>>> >> >>>>>>>> There may be symbols missing, but it might help track down what the OSDs
>>> >> >>>>>>>> are doing.
>>> >> >>>>>>>
>>> >> >>>>>>> Thanks for your help!  I did a couple of runs on my test-cluster,
>>> >> >>>>>>> loading it with writes from 3 VMs concurrently and measuring the results
>>> >> >>>>>>> at the first node with all 0.67 Dumpling-components and with the osds
>>> >> >>>>>>> replaced by 0.61.7 Cuttlefish.  I let `perf top' run and settle for a
>>> >> >>>>>>> while, then copied anything that showed in red and green into this post.
>>> >> >>>>>>> Here are the results (sorry for the word-wraps):
>>> >> >>>>>>>
>>> >> >>>>>>> First, with 0.61.7 osds:
>>> >> >>>>>>>
>>> >> >>>>>>>   19.91%  [kernel]                    [k] intel_idle
>>> >> >>>>>>>   10.18%  [kernel]                    [k] _raw_spin_lock_irqsave
>>> >> >>>>>>>    6.79%  ceph-osd                    [.] ceph_crc32c_le
>>> >> >>>>>>>    4.93%  [kernel]                    [k]
>>> >> >>>>>>> default_send_IPI_mask_sequence_phys
>>> >> >>>>>>>    2.71%  [kernel]                    [k] copy_user_generic_string
>>> >> >>>>>>>    1.42%  libc-2.11.3.so              [.] memcpy
>>> >> >>>>>>>    1.23%  [kernel]                    [k] find_busiest_group
>>> >> >>>>>>>    1.13%  librados.so.2.0.0           [.] ceph_crc32c_le_intel
>>> >> >>>>>>>    1.11%  [kernel]                    [k] _raw_spin_lock
>>> >> >>>>>>>    0.99%  kvm                         [.] 0x1931f8
>>> >> >>>>>>>    0.92%  [igb]                       [k] igb_poll
>>> >> >>>>>>>    0.87%  [kernel]                    [k] native_write_cr0
>>> >> >>>>>>>    0.80%  [kernel]                    [k] csum_partial
>>> >> >>>>>>>    0.78%  [kernel]                    [k] __do_softirq
>>> >> >>>>>>>    0.63%  [kernel]                    [k] hpet_legacy_next_event
>>> >> >>>>>>>    0.53%  [ip_tables]                 [k] ipt_do_table
>>> >> >>>>>>>    0.50%  libc-2.11.3.so              [.] 0x74433
>>> >> >>>>>>>
>>> >> >>>>>>> Second test, with 0.67 osds:
>>> >> >>>>>>>
>>> >> >>>>>>>   18.32%  [kernel]                      [k] intel_idle
>>> >> >>>>>>>    7.58%  [kernel]                      [k] _raw_spin_lock_irqsave
>>> >> >>>>>>>    7.04%  ceph-osd                      [.] PGLog::undirty()
>>> >> >>>>>>>    4.39%  ceph-osd                      [.] ceph_crc32c_le_intel
>>> >> >>>>>>>    3.92%  [kernel]                      [k]
>>> >> >>>>>>> default_send_IPI_mask_sequence_phys
>>> >> >>>>>>>    2.25%  [kernel]                      [k] copy_user_generic_string
>>> >> >>>>>>>    1.76%  libc-2.11.3.so                [.] memcpy
>>> >> >>>>>>>    1.56%  librados.so.2.0.0             [.] ceph_crc32c_le_intel
>>> >> >>>>>>>    1.40%  libc-2.11.3.so                [.] vfprintf
>>> >> >>>>>>>    1.12%  libc-2.11.3.so                [.] 0x7217b
>>> >> >>>>>>>    1.05%  [kernel]                      [k] _raw_spin_lock
>>> >> >>>>>>>    1.01%  [kernel]                      [k] find_busiest_group
>>> >> >>>>>>>    0.83%  kvm                           [.] 0x193ab8
>>> >> >>>>>>>    0.80%  [kernel]                      [k] native_write_cr0
>>> >> >>>>>>>    0.76%  [kernel]                      [k] __do_softirq
>>> >> >>>>>>>    0.73%  libc-2.11.3.so                [.] _IO_default_xsputn
>>> >> >>>>>>>    0.70%  [kernel]                      [k] csum_partial
>>> >> >>>>>>>    0.68%  [igb]                         [k] igb_poll
>>> >> >>>>>>>    0.58%  [kernel]                      [k] hpet_legacy_next_event
>>> >> >>>>>>>    0.54%  [kernel]                      [k] __schedule
>>> >> >>>>>>>
>>> >> >>>>>>>
>>> >> >>>>>>> What jumps right out, is the "PGLog::undirty()", which doesn't show up
>>> >> >>>>>>> with 0.61.7 at all, but is an extra drag right at top-usage in 0.67.
>>> >> >>>>>>> Note that I didn't manage to fully load the test-cluster CPU-wise,
>>> >> >>>>>>> because of network-constraints and I don't want to take any extra risks
>>> >> >>>>>>> on the production-cluster and test it there, but it seems we found a
>>> >> >>>>>>> possible culprit.
>>> >> >>>>>>>
>>> >> >>>>>>> Any ideas?  Thanks again!
>>> >> >>>>>>>
>>> >> >>>>>>>
>>> >> >>>>>>>
>>> >> >>>>>>>     Regards,
>>> >> >>>>>>>
>>> >> >>>>>>>        Oliver
>>> >> >>>>>>>
>>> >> >>>>>>> _______________________________________________
>>> >> >>>>>>> ceph-users mailing list
>>> >> >>>>>>> ceph-users@xxxxxxxxxxxxxx
>>> >> >>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >> >>>>>>
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>> _______________________________________________
>>> >> >>>>> ceph-users mailing list
>>> >> >>>>> ceph-users@xxxxxxxxxxxxxx
>>> >> >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >> >>>>>
>>> >> >>>>
>>> >> >>>> _______________________________________________
>>> >> >>>> ceph-users mailing list
>>> >> >>>> ceph-users@xxxxxxxxxxxxxx
>>> >> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >> >>>>
>>> >> >>>
>>> >> >>>
>>> >> >>> _______________________________________________
>>> >> >>> ceph-users mailing list
>>> >> >>> ceph-users@xxxxxxxxxxxxxx
>>> >> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >> >>
>>> >> >
>>> >>
>>> >
>>> >
>>>
>>
>>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com