[Bug 2246670] Review Request: rocgdb - ROCm source-level debugger

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



https://bugzilla.redhat.com/show_bug.cgi?id=2246670



--- Comment #3 from Ben Woodard <woodard@xxxxxxxxxx> ---
(In reply to Tom Rix from comment #2)
> I am not in favor of a gdb fork

I work closely with the AMD people working on rocGDB. They are committed to
upstreaming their work but the practical necessity of having a debugger for
their programming environment requires a temporary fork. There are two broad
areas that need to be coordinated with upstream before this fork can be healed.

DWARF - debug information
GPU's require debug information that that cannot currently be expressed in the
latest version of the DWARF5 standard. AMD has engineered a set of almost
completely backward compatible set of vendor extensions which allow them to
express the constructs used by GPUs.
https://llvm.org/docs/AMDGPUDwarfExtensionsForHeterogeneousDebugging.html AMD
has been working with other GPU vendors and other tool developers to get
equivalent extensions accepted into the upcoming upstream DWARF standard
DWARF6. This has been a very long and complicated process because many of the
ultimate decision makers within the DWARF standards body did not have
experience with GPU architecture and so AMD basically conducted a master class
in GPU architecture exposing the weaknesses in current DWARF5 for DWARF
committee members, other GPU developers most notably Intel and Red Hat. Several
of the items are already on the docket for consideration by the DWARF committee
for DWARF6 and many others are currently in editorial review to be put in the
queue for DWARF6. https://github.com/ccoutant/dwarf-locations It should be
noted that Cary Coutant is the chair of the DWARF Standards committee.

One of the things that came up in this discussion was that the intent of DWARF
Vendor extensions was that they were to be used as a private agreement between
a producer and a consumer. AMD implemented their extensions in line with this
intention by the DWARF standards body. So their producer in this case a LLVM
port for their GPUs produces DWARF that other tools cannot interpret. However,
because users expect to be able to use a wide variety of tools when developing
software and have compatibility between arbitrary producers and consumers,
vendor extensions have come to be considered not a private agreement between a
particular producer and a particular consumer but rather a universal registry
of extensions that all tools need to understand. Consequently, the upstream
developer community of tools like GDB are reluctant to embrace such a large set
of vendor extensions until the extensions are at least agreed to by the DWARF
standards body. This approach to DWARF vendor extensions is not in agreement
with the original intentions of the DWARF standards body but they now
understand the community's reason for such a position. For DWARF6, I have been
tasked with writing a proposal to change the understanding of DWARF vendor
extensions as it is currently written in the DWARF5 standard from a private
agreement to something focusing on universal compatibility between tools, a
registry if you will. 

GPU device communication 
The kernel currently provides a set of tool interfaces to communicate with
tasks (processes and threads) and memory that it manages. This includes
interfaces such as ptrace(2), /proc/<n>/mem as well as other low level
interfaces used by programming tools. These are specifically not standardized
by POSIX so there are no standards that apply to them. However, tools like
debuggers need the same capabilities. Right now, the way that AMD does this is
they provide a library called rocdbgapi which interfaces with tools and the GPU
driver to provide the needed tooling APIs. The problem with this approach is
that it requires GDB to link with this library to gain access to the needed
APIs. Upstream GDB has yet to be fully convinced of this approach and would
like a solution which could be applied to all accelerators, in other words a
new standard. Linking the way that rocGDB does with librocdbgapi also creates
challenges for packaging because the header files for the APIs provided by
librocdbgapi and all the other accelerators would need to be included in GDB to
enable that functionality. Until there is a standardized tool interface that
vendors of accelerators can implement, upstream GDB is not willing to
incorporate vendor specific tooling interfaces.

Early discussions about standardizing tooling interfaces for accelerators have
begun but most of the time and attention has been directed toward solving the
DWARF problem rather than the tooling interface problem however the ultimate
goal is to standardize these interfaces but until the time when there is an
upstream standard, a practical expedient is that rocGDB needs to link to the
particular accelerator's tooling library, in this case librocdbgapi.

So while Fedora in general rightly takes a position against forks of particular
packages, in a case such as this where the upstream standards are still being
developed, and where the developers of the fork are keeping it up to date with
upstream gdb and participating in the standardization efforts, I think allowing
a forked package should be allowed. 

It also should be noted that this does have precedent, fedora for a very long
time has included the "crash" package with is a fork of gdb with specialized
extensions needed for kernel debugging.

Name        : crash
License     : GPLv3
URL         : https://crash-utility.github.io
Bug URL     : https://bugz.fedoraproject.org/crash
Summary     : Kernel analysis utility for live systems, netdump, diskdump,
kdump, LKCD or mcore dumpfiles
Description :
The core analysis suite is a self-contained tool that can be used to
investigate either live systems, kernel core dumps created from the
netdump, diskdump and kdump packages from Red Hat Linux, the mcore kernel patch
offered by Mission Critical Linux, or the LKCD kernel patch.


-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are always notified about changes to this product and component
https://bugzilla.redhat.com/show_bug.cgi?id=2246670

Report this comment as SPAM: https://bugzilla.redhat.com/enter_bug.cgi?product=Bugzilla&format=report-spam&short_desc=Report%20of%20Bug%202246670%23c3
_______________________________________________
package-review mailing list -- package-review@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to package-review-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/package-review@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue




[Index of Archives]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite Conditions]     [KDE Users]

  Powered by Linux