Re: Significant slowdown of osds since v0.67 Dumpling

Oliver Daudey <oliver@xxxxxxxxx> · Mon, 19 Aug 2013 23:37:30 +0200

Hey Mark,

If I look at the "wip-dumpling-pglog-undirty"-version with regular top,
I see a slightly higher base-load on the osd, with significantly more
and higher spikes in it than the Dumpling-osds.  Looking with `perf
top', "PGLog::undirty()" is still there, although pulling significantly
less CPU.  With the Cuttlefish-osds, I don't see it at all, even under
load.  That may account for the extra load I'm still seeing, but I don't
know what is still going on in it and if that too can be safely disabled
to save some more CPU.

All in all, it's quite close and seems a bit difficult to measure.  I'd
say the CPU-usage with "wip-dumpling-pglog-undirty" is still a good 30%
higher than Cuttlefish on my production-cluster.  I have yet to upgrade
all osds and compare performance of the cluster as a whole.  Is the
"wip-dumpling-pglog-undirty"-version considered safe enough to do so?
If you have any tips for other safe benchmarks, I'll try those as well.
Thanks!

   Regards,

      Oliver

On ma, 2013-08-19 at 15:21 -0500, Mark Nelson wrote:
> Hi Oliver,
> 
> Glad that helped!  How much more efficient do the cuttlefish OSDs seem 
> at this point (with wip-dumpling-pglog-undirty)?  On modern Intel 
> platforms we were actually hoping to see CPU usage go down in many cases 
> due to the use of hardware CRC32 instructions.
> 
> Mark
> 
> On 08/19/2013 03:06 PM, Oliver Daudey wrote:
> > Hey Samuel,
> >
> > Thanks!  I installed your version, repeated the same tests on my
> > test-cluster and the extra CPU-loading seems to have disappeared.  Then
> > I replaced one osd of my production-cluster with your modified version
> > and it's config-option and it seems to be a lot less CPU-hungry now.
> > Although the Cuttlefish-osds still seem to be even more CPU-efficient,
> > your changes have definitely helped a lot.  We seem to be looking in the
> > right direction, at least for this part of the problem.
> >
> > BTW, I ran `perf top' on the production-node with your modified osd and
> > didn't see anything osd-related stand out on top.  "PGLog::undirty()"
> > was in there, but with much lower usage, right at the bottom of the
> > green part of the output.
> >
> > Many thanks for your help so far!
> >
> >
> >     Regards,
> >
> >       Oliver
> >
> > On ma, 2013-08-19 at 00:29 -0700, Samuel Just wrote:
> >> You're right, PGLog::undirty() looks suspicious.  I just pushed a
> >> branch wip-dumpling-pglog-undirty with a new config
> >> (osd_debug_pg_log_writeout) which if set to false will disable some
> >> strictly debugging checks which occur in PGLog::undirty().  We haven't
> >> actually seen these checks causing excessive cpu usage, so this may be
> >> a red herring.
> >> -Sam
> >>
> >> On Sat, Aug 17, 2013 at 2:48 PM, Oliver Daudey <oliver@xxxxxxxxx> wrote:
> >>> Hey Mark,
> >>>
> >>> On za, 2013-08-17 at 08:16 -0500, Mark Nelson wrote:
> >>>> On 08/17/2013 06:13 AM, Oliver Daudey wrote:
> >>>>> Hey all,
> >>>>>
> >>>>> This is a copy of Bug #6040 (http://tracker.ceph.com/issues/6040) I
> >>>>> created in the tracker.  Thought I would pass it through the list as
> >>>>> well, to get an idea if anyone else is running into it.  It may only
> >>>>> show under higher loads.  More info about my setup is in the bug-report
> >>>>> above.  Here goes:
> >>>>>
> >>>>>
> >>>>> I'm running a Ceph-cluster with 3 nodes, each of which runs a mon, osd
> >>>>> and mds. I'm using RBD on this cluster as storage for KVM, CephFS is
> >>>>> unused at this time. While still on v0.61.7 Cuttlefish, I got 70-100
> >>>>> +MB/sec on simple linear writes to a file with `dd' inside a VM on this
> >>>>> cluster under regular load and the osds usually averaged 20-100%
> >>>>> CPU-utilisation in `top'. After the upgrade to Dumpling, CPU-usage for
> >>>>> the osds shot up to 100% to 400% in `top' (multi-core system) and the
> >>>>> speed for my writes with `dd' inside a VM dropped to 20-40MB/sec. Users
> >>>>> complained that disk-access inside the VMs was significantly slower and
> >>>>> the backups of the RBD-store I was running, also got behind quickly.
> >>>>>
> >>>>> After downgrading only the osds to v0.61.7 Cuttlefish and leaving the
> >>>>> rest at 0.67 Dumpling, speed and load returned to normal. I have
> >>>>> repeated this performance-hit upon upgrade on a similar test-cluster
> >>>>> under no additional load at all. Although CPU-usage for the osds wasn't
> >>>>> as dramatic during these tests because there was no base-load from other
> >>>>> VMs, I/O-performance dropped significantly after upgrading during these
> >>>>> tests as well, and returned to normal after downgrading the osds.
> >>>>>
> >>>>> I'm not sure what to make of it. There are no visible errors in the logs
> >>>>> and everything runs and reports good health, it's just a lot slower,
> >>>>> with a lot more CPU-usage.
> >>>>
> >>>> Hi Oliver,
> >>>>
> >>>> If you have access to the perf command on this system, could you try
> >>>> running:
> >>>>
> >>>> "sudo perf top"
> >>>>
> >>>> And if that doesn't give you much,
> >>>>
> >>>> "sudo perf record -g"
> >>>>
> >>>> then:
> >>>>
> >>>> "sudo perf report | less"
> >>>>
> >>>> during the period of high CPU usage?  This will give you a call graph.
> >>>> There may be symbols missing, but it might help track down what the OSDs
> >>>> are doing.
> >>>
> >>> Thanks for your help!  I did a couple of runs on my test-cluster,
> >>> loading it with writes from 3 VMs concurrently and measuring the results
> >>> at the first node with all 0.67 Dumpling-components and with the osds
> >>> replaced by 0.61.7 Cuttlefish.  I let `perf top' run and settle for a
> >>> while, then copied anything that showed in red and green into this post.
> >>> Here are the results (sorry for the word-wraps):
> >>>
> >>> First, with 0.61.7 osds:
> >>>
> >>>   19.91%  [kernel]                    [k] intel_idle
> >>>   10.18%  [kernel]                    [k] _raw_spin_lock_irqsave
> >>>    6.79%  ceph-osd                    [.] ceph_crc32c_le
> >>>    4.93%  [kernel]                    [k]
> >>> default_send_IPI_mask_sequence_phys
> >>>    2.71%  [kernel]                    [k] copy_user_generic_string
> >>>    1.42%  libc-2.11.3.so              [.] memcpy
> >>>    1.23%  [kernel]                    [k] find_busiest_group
> >>>    1.13%  librados.so.2.0.0           [.] ceph_crc32c_le_intel
> >>>    1.11%  [kernel]                    [k] _raw_spin_lock
> >>>    0.99%  kvm                         [.] 0x1931f8
> >>>    0.92%  [igb]                       [k] igb_poll
> >>>    0.87%  [kernel]                    [k] native_write_cr0
> >>>    0.80%  [kernel]                    [k] csum_partial
> >>>    0.78%  [kernel]                    [k] __do_softirq
> >>>    0.63%  [kernel]                    [k] hpet_legacy_next_event
> >>>    0.53%  [ip_tables]                 [k] ipt_do_table
> >>>    0.50%  libc-2.11.3.so              [.] 0x74433
> >>>
> >>> Second test, with 0.67 osds:
> >>>
> >>>   18.32%  [kernel]                      [k] intel_idle
> >>>    7.58%  [kernel]                      [k] _raw_spin_lock_irqsave
> >>>    7.04%  ceph-osd                      [.] PGLog::undirty()
> >>>    4.39%  ceph-osd                      [.] ceph_crc32c_le_intel
> >>>    3.92%  [kernel]                      [k]
> >>> default_send_IPI_mask_sequence_phys
> >>>    2.25%  [kernel]                      [k] copy_user_generic_string
> >>>    1.76%  libc-2.11.3.so                [.] memcpy
> >>>    1.56%  librados.so.2.0.0             [.] ceph_crc32c_le_intel
> >>>    1.40%  libc-2.11.3.so                [.] vfprintf
> >>>    1.12%  libc-2.11.3.so                [.] 0x7217b
> >>>    1.05%  [kernel]                      [k] _raw_spin_lock
> >>>    1.01%  [kernel]                      [k] find_busiest_group
> >>>    0.83%  kvm                           [.] 0x193ab8
> >>>    0.80%  [kernel]                      [k] native_write_cr0
> >>>    0.76%  [kernel]                      [k] __do_softirq
> >>>    0.73%  libc-2.11.3.so                [.] _IO_default_xsputn
> >>>    0.70%  [kernel]                      [k] csum_partial
> >>>    0.68%  [igb]                         [k] igb_poll
> >>>    0.58%  [kernel]                      [k] hpet_legacy_next_event
> >>>    0.54%  [kernel]                      [k] __schedule
> >>>
> >>>
> >>> What jumps right out, is the "PGLog::undirty()", which doesn't show up
> >>> with 0.61.7 at all, but is an extra drag right at top-usage in 0.67.
> >>> Note that I didn't manage to fully load the test-cluster CPU-wise,
> >>> because of network-constraints and I don't want to take any extra risks
> >>> on the production-cluster and test it there, but it seems we found a
> >>> possible culprit.
> >>>
> >>> Any ideas?  Thanks again!
> >>>
> >>>
> >>>
> >>>     Regards,
> >>>
> >>>        Oliver
> >>>
> >>> _______________________________________________
> >>> ceph-users mailing list
> >>> ceph-users@xxxxxxxxxxxxxx
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com