Whole system analysis with frame pointers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I was asked about the topic in the subject, and I think it's not very
well known.  The news is that since Fedora 38, whole system
performance analysis is now easy to do.  This can be used to identify
hot spots in single applications, or what the (whole computer) is
really doing during lengthy operations.

You can visualise these in various ways - my favourite is Brendan
Gregg's Flame Graphs tools, but perf has many alternate ways to
capture and display the data:

  https://www.brendangregg.com/linuxperf.html
  https://www.brendangregg.com/flamegraphs.html
  https://perf.wiki.kernel.org/index.php/Tutorial

I did a 15 min talk on this topic, actually to an internal Red Hat
audience, but I guess it's fine to open it up to everyone:

  http://oirase.annexia.org/tmp/2023-03-08-flamegraphs.mp4 [57M, 15m41s]


To show the kind of thing which is possible I have captured three
whole system flame graphs.  First comes from doing "make -j32" in the
qemu build tree:

  http://oirase.annexia.org/tmp/2023-gcc-with-lto.svg

8% of the time is spent running the assembler.  I seem to recall that
Clang uses a different approach of integrating the assembler into the
compiler and I guess it probably avoids most of that overhead.

The second is an rpmbuild of the Fedora Rawhide kernel package:

  http://oirase.annexia.org/tmp/2023-kernel-build.svg

I think it's interesting that 6% of the time is spent compressing the
RPMs, and another 6% running pahole (debuginfo generation?)  But the
most surprising thing is it appears 42% of the time is spent just
parsing C code [if I'm reading that right, I actually can't believe
parsing takes so much time].  If true there must be opportunities to
optimize things here.

Captures work across userspace and kernel code, as shown in the third
example which is a KVM (ie. hardware assisted) virtual machine doing
some highly parallel work inside:

  http://oirase.annexia.org/tmp/2023-kvm-build.svg

You can clearly see the 8 virtual (guest) CPUs on the left side, using
KVM.  More interesting is that this guest uses a qcow2 file for disk
and there's a heck of an overhead writing to that file.  There's
nothing to fix here -- qcow2 files shouldn't be used in this
situation; for best performance it would be better to use a local
block device to back the guest.


The overhead of frame pointers in my measurements is about 1%, so this
enhanced visibility into the system seems well worthwhile.  I use this
all the time.  This year I've used it to suggest optimizations in
qemu, nbdkit and the kernel.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Fedora Announce]     [Fedora Users]     [Fedora Kernel]     [Fedora Testing]     [Fedora Formulas]     [Fedora PHP Devel]     [Kernel Development]     [Fedora Legacy]     [Fedora Maintainers]     [Fedora Desktop]     [PAM]     [Red Hat Development]     [Gimp]     [Yosemite News]

  Powered by Linux