Re: Significant slowdown of osds since v0.67 Dumpling

Samuel Just <sam.just@xxxxxxxxxxx> · Wed, 21 Aug 2013 18:00:55 -0700



You'd need to unmount the fs to actually clear the cache.  Did you see
a significant difference in load between the runs?  To confirm, the
rbd client is dumpling the entire time?
-Sam

On Wed, Aug 21, 2013 at 2:28 PM, Oliver Daudey <oliver@xxxxxxxxx> wrote:
> Hey Samuel,
>
> I repeated the same test several times before my post and just now 2
> more times.  It holds up and is also repeatable in reverse order, with
> the same results.  Remember, I restart all OSDs between tests, so any
> caches should get destroyed and besides, I'm writing.  That shouldn't
> involve very large-scale caching, right?  I also waited for the cluster
> to come to a healthy state after restarting the OSDs, so it's not
> related to rebalancing or peering-activity, either.
>
>
>    Regards,
>
>       Oliver
>
> On wo, 2013-08-21 at 14:07 -0700, Samuel Just wrote:
>> Try it again in the reverse order, I strongly suspect caching effects.
>> -Sam
>>
>> On Wed, Aug 21, 2013 at 1:34 PM, Oliver Daudey <oliver@xxxxxxxxx> wrote:
>> > Hey Samuel,
>> >
>> > Finally got it reproduced on my test-cluster, which was otherwise
>> > unloaded at the time.  First, with Dumpling:
>> >
>> > # rbd create --size 102400 test
>> > # ceph-osd --version
>> > ceph version 0.67.1-6-g0c4f2f3
>> > (0c4f2f34b78b634efe7f4d56694e2edeeda5a130)
>> > # rbd bench-write test
>> > bench-write  io_size 4096 io_threads 16 bytes 1073741824 pattern seq
>> >   SEC       OPS   OPS/SEC   BYTES/SEC
>> >     1       204    200.67  592272.58
>> >     2       401    200.24  607462.05
>> >     3       625    204.62  639655.53
>> >     4       824    205.20  636488.76
>> >     5      1026    204.08  630587.77
>> >     6      1217    202.69  628965.39
>> >     7      1429    203.19  633671.84
>> >     8      1626    202.93  635419.90
>> >     9      1837    203.11  630421.40
>> >    10      2045    204.16  631364.11
>> >    11      2247    204.12  625833.91
>> >    12      2456    204.29  629952.57
>> >    13      2655    204.15  634639.37
>> >    14      2888    206.15  637373.53
>> >    15      3115    207.12  640826.18
>> >    16      3330    207.75  640120.38
>> >    17      3529    207.57  636277.18
>> >    18      3750    208.24  635036.47
>> >    19      3965    208.65  631118.13
>> >    20      4216    210.75  638197.93
>> >    21      4534    215.83  656890.46
>> >    22      4809    218.50  666612.75
>> >    23      5076    220.63  678678.36
>> >    24      5320    221.39  686760.90
>> >    25      5651    226.02  701345.26
>> >    26      5967    229.44  706839.16
>> >    27      6271    232.05  707958.62
>> >    28      6522    232.82  709890.37
>> >    29      6773    233.37  711718.21
>> >    30      7151    238.23  722924.18
>> >
>> > Then I replaced the OSD with Cuttlefish on all nodes and restarted it
>> > everywhere.
>> >
>> > # ceph-osd --version
>> > ceph version 0.61.7 (8f010aff684e820ecc837c25ac77c7a05d7191ff)
>> > # rbd bench-write test
>> > bench-write  io_size 4096 io_threads 16 bytes 1073741824 pattern seq
>> >   SEC       OPS   OPS/SEC   BYTES/SEC
>> >     1       556    555.63  1915665.58
>> >     2      1095    545.08  1839139.42
>> >     3      1696    564.45  1840316.22
>> >     4      2312    577.25  1832641.60
>> >     5      2870    573.46  1800558.64
>> >     6      3461    576.81  1780993.33
>> >     7      4062    579.26  1821246.36
>> >     8      4678    581.51  1862003.10
>> >     9      5269    585.12  1902219.78
>> >    10      5883    587.11  1914700.43
>> >    11      6471    587.59  1913208.96
>> >    12      7022    585.12  1914375.79
>> >    13      7568    580.15  1899650.34
>> >    14      8122    579.87  1864572.53
>> >    15      8659    576.18  1826927.83
>> >    16      9259    576.81  1796653.40
>> >    17      9826    577.60  1785328.57
>> >    18     10401    576.30  1777479.51
>> >    19     10962    576.94  1777431.02
>> >    20     11399    569.84  1756633.76
>> >    21     11937    567.45  1740147.43
>> >    22     12491    567.47  1726288.46
>> >    23     13109    569.92  1727347.20
>> >    24     13629    567.64  1721996.79
>> >    25     14153    566.11  1720782.06
>> >    26     14669    563.23  1716275.34
>> >    27     15191    562.61  1717677.74
>> >    28     15768    562.49  1713059.35
>> >    29     16281    561.40  1707289.23
>> >    31     16719    536.96  1628452.43
>> > ^C
>> >
>> > As you can see, the difference is quite dramatic and this confirms it's
>> > probably something RBD-related.  The test-cluster is a standard 3-node
>> > cluster with 3xMON, 3xMDS, 3xOSD, tied together with 2x1Gbit/s ethernet.
>> > I replaced only the OSD-binary for these tests.  The rest of the
>> > Ceph-installation is at Dumpling.
>> >
>> >
>> >    Regards,
>> >
>> >      Oliver
>> >
>> > On wo, 2013-08-21 at 12:05 -0700, Samuel Just wrote:
>> >> There haven't been any significant osd side changes that I can think
>> >> of.  Is cpu usage still high?  If so, can you post the profiler
>> >> results again?
>> >> -Sam
>> >>
>> >> On Wed, Aug 21, 2013 at 12:02 PM, Oliver Daudey <oliver@xxxxxxxxx> wrote:
>> >> > Hey Samuel,
>> >> >
>> >> > I had a good run on the production-cluster with it and unfortunately, it
>> >> > still doesn't seem to have solved the problem.  It seemed OK for a while
>> >> > and individual OSD CPU-usage seemed quite low, but as the cluster's load
>> >> > increased during the day, things got slower again.  Write-performance
>> >> > within a VM crawled to 30MB/sec and at some point, I got only 10MB/sec
>> >> > on reads in that same VM.  I also did RADOS bench-tests with `rados
>> >> > --pool rbd bench 120 write' and those got several hundreds of MB's/sec
>> >> > on the same cluster at the same time of day, so maybe the problem is
>> >> > RBD-related.  Is there any code in the OSD that could influence
>> >> > RBD-performance alone?  Do you know of any other significant changes to
>> >> > the OSD between Cuttlefish and Dumpling that could result in this?
>> >> >
>> >> > PS: I also did the same RADOS bench-tests on my test-cluster, both with
>> >> > Cuttlefish and Dumpling without your fix and got almost identical
>> >> > results.  This confirms that the problem might be in RBD, as Mark suggested.
>> >> >
>> >> >
>> >> >     Regards,
>> >> >
>> >> >        Oliver
>> >> >
>> >> > On 20-08-13 19:40, Samuel Just wrote:
>> >> >> Can you try dumpling head without the option?
>> >> >> -Sam
>> >> >>
>> >> >> On Tue, Aug 20, 2013 at 1:44 AM, Oliver Daudey <oliver@xxxxxxxxx> wrote:
>> >> >>> Hey Mark,
>> >> >>>
>> >> >>> Sorry, but after some more tests I have to report that it only worked
>> >> >>> partly.  The load seems lower with "wip-dumpling-pglog-undirty" in
>> >> >>> place, but the Cuttlefish-osd still seems significantly faster and even
>> >> >>> with "wip-dumpling-pglog-undirty" in place, things slow down way too
>> >> >>> much under load.  Unfortunately, only my production-cluster seems busy
>> >> >>> enough to actually show the problem clearly by slowing down.  Below is
>> >> >>> `perf top'-output, fresh from my production-cluster under it's regular
>> >> >>> load:
>> >> >>>
>> >> >>> First, the 0.67.1-6-g0c4f2f3 osd with "osd debug pg log writeout =
>> >> >>> false":
>> >> >>>  16.53%  [kernel]                     [k]
>> >> >>> intel_idle
>> >> >>>   6.47%  libleveldb.so.1.9            [.]
>> >> >>> 0x380a1
>> >> >>>   5.76%  [kernel]                     [k]
>> >> >>> find_busiest_group
>> >> >>>   4.11%  libc-2.11.3.so               [.]
>> >> >>> memcmp
>> >> >>>   3.95%  kvm                          [.]
>> >> >>> 0x1f6f31
>> >> >>>   2.05%  [kernel]                     [k]
>> >> >>> default_send_IPI_mask_sequence_phys
>> >> >>>   2.03%  [kernel]                     [k]
>> >> >>> _raw_spin_lock
>> >> >>>   1.87%  libleveldb.so.1.9            [.]
>> >> >>> leveldb::InternalKeyComparator::Compar
>> >> >>>   1.57%  libc-2.11.3.so               [.]
>> >> >>> memcpy
>> >> >>>   1.37%  libleveldb.so.1.9            [.]
>> >> >>> leveldb::Block::Iter::Next()
>> >> >>>   1.26%  [kernel]                     [k]
>> >> >>> hrtimer_interrupt
>> >> >>>   1.12%  [kernel]                     [k]
>> >> >>> __hrtimer_start_range_ns
>> >> >>>   1.09%  [kernel]                     [k]
>> >> >>> native_write_cr0
>> >> >>>   1.05%  libstdc++.so.6.0.13          [.]
>> >> >>> std::string::_M_mutate(unsigned long,
>> >> >>>   1.00%  [kernel]                     [k]
>> >> >>> native_write_msr_safe
>> >> >>>   0.99%  [kernel]                     [k]
>> >> >>> apic_timer_interrupt
>> >> >>>   0.98%  [kernel]                     [k]
>> >> >>> clockevents_program_event
>> >> >>>   0.96%  [kernel]                     [k]
>> >> >>> _raw_spin_unlock_irqrestore
>> >> >>>   0.95%  ceph-osd                     [.]
>> >> >>> PGLog::undirty()
>> >> >>>   0.92%  [kernel]                     [k]
>> >> >>> find_next_bit
>> >> >>>   0.91%  libsnappy.so.1.1.2           [.]
>> >> >>> snappy::RawUncompress(snappy::Source*,
>> >> >>>   0.88%  [kernel]                     [k]
>> >> >>> __schedule
>> >> >>>   0.87%  [kernel]                     [k]
>> >> >>> cpumask_next_and
>> >> >>>   0.84%  [kernel]                     [k]
>> >> >>> do_select
>> >> >>>   0.80%  [kernel]                     [k]
>> >> >>> fget_light
>> >> >>>   0.77%  [kernel]                     [k]
>> >> >>> reschedule_interrupt
>> >> >>>   0.75%  [kernel]                     [k]
>> >> >>> _raw_spin_lock_irqsave
>> >> >>>   0.62%  libstdc++.so.6.0.13          [.] std::string::append(char
>> >> >>> const*, unsig
>> >> >>>   0.59%  [kvm_intel]                  [k]
>> >> >>> vmx_vcpu_run
>> >> >>>   0.58%  [kernel]                     [k]
>> >> >>> copy_user_generic_string
>> >> >>>   0.56%  [kernel]                     [k]
>> >> >>> load_balance
>> >> >>>   0.54%  [kernel]                     [k]
>> >> >>> tg_load_down
>> >> >>>   0.53%  libpthread-2.11.3.so         [.]
>> >> >>> pthread_mutex_lock
>> >> >>>   0.52%  [kernel]                     [k] sync_inodes_sb
>> >> >>>
>> >> >>> Second, the 0.61.8 osd, under identical load:
>> >> >>>  21.51%  [kernel]                     [k]
>> >> >>> intel_idle
>> >> >>>   6.66%  [kernel]                     [k]
>> >> >>> find_busiest_group
>> >> >>>   6.25%  kvm                          [.]
>> >> >>> 0x2d214b
>> >> >>>   1.97%  [kernel]                     [k]
>> >> >>> _raw_spin_lock
>> >> >>>   1.47%  [kernel]                     [k]
>> >> >>> native_write_msr_safe
>> >> >>>   1.44%  [kernel]                     [k]
>> >> >>> hrtimer_interrupt
>> >> >>>   1.37%  [kernel]                     [k]
>> >> >>> __hrtimer_start_range_ns
>> >> >>>   1.34%  [kernel]                     [k]
>> >> >>> do_select
>> >> >>>   1.29%  [kernel]                     [k]
>> >> >>> fget_light
>> >> >>>   1.24%  [kernel]                     [k]
>> >> >>> clockevents_program_event
>> >> >>>   1.21%  [kernel]                     [k]
>> >> >>> default_send_IPI_mask_sequence_phys
>> >> >>>   1.18%  [kernel]                     [k]
>> >> >>> cpumask_next_and
>> >> >>>   1.18%  [kernel]                     [k]
>> >> >>> _raw_spin_unlock_irqrestore
>> >> >>>   1.15%  [kernel]                     [k]
>> >> >>> find_next_bit
>> >> >>>   1.14%  [kernel]                     [k]
>> >> >>> __schedule
>> >> >>>   1.11%  [kernel]                     [k]
>> >> >>> _raw_spin_lock_irqsave
>> >> >>>   0.98%  [kernel]                     [k]
>> >> >>> apic_timer_interrupt
>> >> >>>   0.86%  [kernel]                     [k]
>> >> >>> copy_user_generic_string
>> >> >>>   0.83%  [kernel]                     [k]
>> >> >>> native_write_cr0
>> >> >>>   0.76%  [kernel]                     [k]
>> >> >>> sync_inodes_sb
>> >> >>>   0.71%  [kernel]                     [k]
>> >> >>> rcu_needs_cpu
>> >> >>>   0.69%  libpthread-2.11.3.so         [.]
>> >> >>> pthread_mutex_lock
>> >> >>>   0.66%  [kernel]                     [k]
>> >> >>> fput
>> >> >>>   0.62%  [kernel]                     [k]
>> >> >>> load_balance
>> >> >>>   0.57%  [vdso]                       [.]
>> >> >>> 0x7fff3a976700
>> >> >>>   0.56%  libc-2.11.3.so               [.]
>> >> >>> memcpy
>> >> >>>   0.56%  [kernel]                     [k]
>> >> >>> reschedule_interrupt
>> >> >>>   0.56%  [kernel]                     [k]
>> >> >>> tg_load_down
>> >> >>>   0.50%  [kernel]                     [k] iput
>> >> >>>
>> >> >>> I see lots of new differences, but again don't know what to make of it
>> >> >>> and what might be related or significant.  LevelDB seems to jump out
>> >> >>> this time, amongst others.  Let me know if you need more info.
>> >> >>>
>> >> >>>
>> >> >>>    Regards,
>> >> >>>
>> >> >>>      Oliver
>> >> >>>
>> >> >>> On ma, 2013-08-19 at 15:21 -0500, Mark Nelson wrote:
>> >> >>>> Hi Oliver,
>> >> >>>>
>> >> >>>> Glad that helped!  How much more efficient do the cuttlefish OSDs seem
>> >> >>>> at this point (with wip-dumpling-pglog-undirty)?  On modern Intel
>> >> >>>> platforms we were actually hoping to see CPU usage go down in many cases
>> >> >>>> due to the use of hardware CRC32 instructions.
>> >> >>>>
>> >> >>>> Mark
>> >> >>>>
>> >> >>>> On 08/19/2013 03:06 PM, Oliver Daudey wrote:
>> >> >>>>> Hey Samuel,
>> >> >>>>>
>> >> >>>>> Thanks!  I installed your version, repeated the same tests on my
>> >> >>>>> test-cluster and the extra CPU-loading seems to have disappeared.  Then
>> >> >>>>> I replaced one osd of my production-cluster with your modified version
>> >> >>>>> and it's config-option and it seems to be a lot less CPU-hungry now.
>> >> >>>>> Although the Cuttlefish-osds still seem to be even more CPU-efficient,
>> >> >>>>> your changes have definitely helped a lot.  We seem to be looking in the
>> >> >>>>> right direction, at least for this part of the problem.
>> >> >>>>>
>> >> >>>>> BTW, I ran `perf top' on the production-node with your modified osd and
>> >> >>>>> didn't see anything osd-related stand out on top.  "PGLog::undirty()"
>> >> >>>>> was in there, but with much lower usage, right at the bottom of the
>> >> >>>>> green part of the output.
>> >> >>>>>
>> >> >>>>> Many thanks for your help so far!
>> >> >>>>>
>> >> >>>>>
>> >> >>>>>     Regards,
>> >> >>>>>
>> >> >>>>>       Oliver
>> >> >>>>>
>> >> >>>>> On ma, 2013-08-19 at 00:29 -0700, Samuel Just wrote:
>> >> >>>>>> You're right, PGLog::undirty() looks suspicious.  I just pushed a
>> >> >>>>>> branch wip-dumpling-pglog-undirty with a new config
>> >> >>>>>> (osd_debug_pg_log_writeout) which if set to false will disable some
>> >> >>>>>> strictly debugging checks which occur in PGLog::undirty().  We haven't
>> >> >>>>>> actually seen these checks causing excessive cpu usage, so this may be
>> >> >>>>>> a red herring.
>> >> >>>>>> -Sam
>> >> >>>>>>
>> >> >>>>>> On Sat, Aug 17, 2013 at 2:48 PM, Oliver Daudey <oliver@xxxxxxxxx> wrote:
>> >> >>>>>>> Hey Mark,
>> >> >>>>>>>
>> >> >>>>>>> On za, 2013-08-17 at 08:16 -0500, Mark Nelson wrote:
>> >> >>>>>>>> On 08/17/2013 06:13 AM, Oliver Daudey wrote:
>> >> >>>>>>>>> Hey all,
>> >> >>>>>>>>>
>> >> >>>>>>>>> This is a copy of Bug #6040 (http://tracker.ceph.com/issues/6040) I
>> >> >>>>>>>>> created in the tracker.  Thought I would pass it through the list as
>> >> >>>>>>>>> well, to get an idea if anyone else is running into it.  It may only
>> >> >>>>>>>>> show under higher loads.  More info about my setup is in the bug-report
>> >> >>>>>>>>> above.  Here goes:
>> >> >>>>>>>>>
>> >> >>>>>>>>>
>> >> >>>>>>>>> I'm running a Ceph-cluster with 3 nodes, each of which runs a mon, osd
>> >> >>>>>>>>> and mds. I'm using RBD on this cluster as storage for KVM, CephFS is
>> >> >>>>>>>>> unused at this time. While still on v0.61.7 Cuttlefish, I got 70-100
>> >> >>>>>>>>> +MB/sec on simple linear writes to a file with `dd' inside a VM on this
>> >> >>>>>>>>> cluster under regular load and the osds usually averaged 20-100%
>> >> >>>>>>>>> CPU-utilisation in `top'. After the upgrade to Dumpling, CPU-usage for
>> >> >>>>>>>>> the osds shot up to 100% to 400% in `top' (multi-core system) and the
>> >> >>>>>>>>> speed for my writes with `dd' inside a VM dropped to 20-40MB/sec. Users
>> >> >>>>>>>>> complained that disk-access inside the VMs was significantly slower and
>> >> >>>>>>>>> the backups of the RBD-store I was running, also got behind quickly.
>> >> >>>>>>>>>
>> >> >>>>>>>>> After downgrading only the osds to v0.61.7 Cuttlefish and leaving the
>> >> >>>>>>>>> rest at 0.67 Dumpling, speed and load returned to normal. I have
>> >> >>>>>>>>> repeated this performance-hit upon upgrade on a similar test-cluster
>> >> >>>>>>>>> under no additional load at all. Although CPU-usage for the osds wasn't
>> >> >>>>>>>>> as dramatic during these tests because there was no base-load from other
>> >> >>>>>>>>> VMs, I/O-performance dropped significantly after upgrading during these
>> >> >>>>>>>>> tests as well, and returned to normal after downgrading the osds.
>> >> >>>>>>>>>
>> >> >>>>>>>>> I'm not sure what to make of it. There are no visible errors in the logs
>> >> >>>>>>>>> and everything runs and reports good health, it's just a lot slower,
>> >> >>>>>>>>> with a lot more CPU-usage.
>> >> >>>>>>>>
>> >> >>>>>>>> Hi Oliver,
>> >> >>>>>>>>
>> >> >>>>>>>> If you have access to the perf command on this system, could you try
>> >> >>>>>>>> running:
>> >> >>>>>>>>
>> >> >>>>>>>> "sudo perf top"
>> >> >>>>>>>>
>> >> >>>>>>>> And if that doesn't give you much,
>> >> >>>>>>>>
>> >> >>>>>>>> "sudo perf record -g"
>> >> >>>>>>>>
>> >> >>>>>>>> then:
>> >> >>>>>>>>
>> >> >>>>>>>> "sudo perf report | less"
>> >> >>>>>>>>
>> >> >>>>>>>> during the period of high CPU usage?  This will give you a call graph.
>> >> >>>>>>>> There may be symbols missing, but it might help track down what the OSDs
>> >> >>>>>>>> are doing.
>> >> >>>>>>>
>> >> >>>>>>> Thanks for your help!  I did a couple of runs on my test-cluster,
>> >> >>>>>>> loading it with writes from 3 VMs concurrently and measuring the results
>> >> >>>>>>> at the first node with all 0.67 Dumpling-components and with the osds
>> >> >>>>>>> replaced by 0.61.7 Cuttlefish.  I let `perf top' run and settle for a
>> >> >>>>>>> while, then copied anything that showed in red and green into this post.
>> >> >>>>>>> Here are the results (sorry for the word-wraps):
>> >> >>>>>>>
>> >> >>>>>>> First, with 0.61.7 osds:
>> >> >>>>>>>
>> >> >>>>>>>   19.91%  [kernel]                    [k] intel_idle
>> >> >>>>>>>   10.18%  [kernel]                    [k] _raw_spin_lock_irqsave
>> >> >>>>>>>    6.79%  ceph-osd                    [.] ceph_crc32c_le
>> >> >>>>>>>    4.93%  [kernel]                    [k]
>> >> >>>>>>> default_send_IPI_mask_sequence_phys
>> >> >>>>>>>    2.71%  [kernel]                    [k] copy_user_generic_string
>> >> >>>>>>>    1.42%  libc-2.11.3.so              [.] memcpy
>> >> >>>>>>>    1.23%  [kernel]                    [k] find_busiest_group
>> >> >>>>>>>    1.13%  librados.so.2.0.0           [.] ceph_crc32c_le_intel
>> >> >>>>>>>    1.11%  [kernel]                    [k] _raw_spin_lock
>> >> >>>>>>>    0.99%  kvm                         [.] 0x1931f8
>> >> >>>>>>>    0.92%  [igb]                       [k] igb_poll
>> >> >>>>>>>    0.87%  [kernel]                    [k] native_write_cr0
>> >> >>>>>>>    0.80%  [kernel]                    [k] csum_partial
>> >> >>>>>>>    0.78%  [kernel]                    [k] __do_softirq
>> >> >>>>>>>    0.63%  [kernel]                    [k] hpet_legacy_next_event
>> >> >>>>>>>    0.53%  [ip_tables]                 [k] ipt_do_table
>> >> >>>>>>>    0.50%  libc-2.11.3.so              [.] 0x74433
>> >> >>>>>>>
>> >> >>>>>>> Second test, with 0.67 osds:
>> >> >>>>>>>
>> >> >>>>>>>   18.32%  [kernel]                      [k] intel_idle
>> >> >>>>>>>    7.58%  [kernel]                      [k] _raw_spin_lock_irqsave
>> >> >>>>>>>    7.04%  ceph-osd                      [.] PGLog::undirty()
>> >> >>>>>>>    4.39%  ceph-osd                      [.] ceph_crc32c_le_intel
>> >> >>>>>>>    3.92%  [kernel]                      [k]
>> >> >>>>>>> default_send_IPI_mask_sequence_phys
>> >> >>>>>>>    2.25%  [kernel]                      [k] copy_user_generic_string
>> >> >>>>>>>    1.76%  libc-2.11.3.so                [.] memcpy
>> >> >>>>>>>    1.56%  librados.so.2.0.0             [.] ceph_crc32c_le_intel
>> >> >>>>>>>    1.40%  libc-2.11.3.so                [.] vfprintf
>> >> >>>>>>>    1.12%  libc-2.11.3.so                [.] 0x7217b
>> >> >>>>>>>    1.05%  [kernel]                      [k] _raw_spin_lock
>> >> >>>>>>>    1.01%  [kernel]                      [k] find_busiest_group
>> >> >>>>>>>    0.83%  kvm                           [.] 0x193ab8
>> >> >>>>>>>    0.80%  [kernel]                      [k] native_write_cr0
>> >> >>>>>>>    0.76%  [kernel]                      [k] __do_softirq
>> >> >>>>>>>    0.73%  libc-2.11.3.so                [.] _IO_default_xsputn
>> >> >>>>>>>    0.70%  [kernel]                      [k] csum_partial
>> >> >>>>>>>    0.68%  [igb]                         [k] igb_poll
>> >> >>>>>>>    0.58%  [kernel]                      [k] hpet_legacy_next_event
>> >> >>>>>>>    0.54%  [kernel]                      [k] __schedule
>> >> >>>>>>>
>> >> >>>>>>>
>> >> >>>>>>> What jumps right out, is the "PGLog::undirty()", which doesn't show up
>> >> >>>>>>> with 0.61.7 at all, but is an extra drag right at top-usage in 0.67.
>> >> >>>>>>> Note that I didn't manage to fully load the test-cluster CPU-wise,
>> >> >>>>>>> because of network-constraints and I don't want to take any extra risks
>> >> >>>>>>> on the production-cluster and test it there, but it seems we found a
>> >> >>>>>>> possible culprit.
>> >> >>>>>>>
>> >> >>>>>>> Any ideas?  Thanks again!
>> >> >>>>>>>
>> >> >>>>>>>
>> >> >>>>>>>
>> >> >>>>>>>     Regards,
>> >> >>>>>>>
>> >> >>>>>>>        Oliver
>> >> >>>>>>>
>> >> >>>>>>> _______________________________________________
>> >> >>>>>>> ceph-users mailing list
>> >> >>>>>>> ceph-users@xxxxxxxxxxxxxx
>> >> >>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> >>>>>>
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> _______________________________________________
>> >> >>>>> ceph-users mailing list
>> >> >>>>> ceph-users@xxxxxxxxxxxxxx
>> >> >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> >>>>>
>> >> >>>>
>> >> >>>> _______________________________________________
>> >> >>>> ceph-users mailing list
>> >> >>>> ceph-users@xxxxxxxxxxxxxx
>> >> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> >>>>
>> >> >>>
>> >> >>>
>> >> >>> _______________________________________________
>> >> >>> ceph-users mailing list
>> >> >>> ceph-users@xxxxxxxxxxxxxx
>> >> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> >>
>> >> >
>> >>
>> >
>> >
>>
>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com