Original Cover Letter:
This patch set implements a Heterogeneous System Architecture (HSA)
driver
for radeon-family GPUs.
HSA allows different processor types (CPUs, DSPs, GPUs, etc..) to share
system resources more effectively via HW features including shared
pageable
memory, userspace-accessible work queues, and platform-level
atomics. In
addition to the memory protection mechanisms in GPUVM and IOMMUv2,
the Sea
Islands family of GPUs also performs HW-level validation of commands
passed
in through the queues (aka rings).
The code in this patch set is intended to serve both as a sample
driver for
other HSA-compatible hardware devices and as a production driver for
radeon-family processors. The code is architected to support
multiple CPUs
each with connected GPUs, although the current implementation
focuses on a
single Kaveri/Berlin APU, and works alongside the existing radeon
kernel
graphics driver (kgd).
AMD GPUs designed for use with HSA (Sea Islands and up) share some
hardware
functionality between HSA compute and regular gfx/compute (memory,
interrupts, registers), while other functionality has been added
specifically for HSA compute (hw scheduler for virtualized compute
rings).
All shared hardware is owned by the radeon graphics driver, and an
interface
between kfd and kgd allows the kfd to make use of those shared
resources,
while HSA-specific functionality is managed directly by kfd by
submitting
packets into an HSA-specific command queue (the "HIQ").
During kfd module initialization a char device node (/dev/kfd) is
created
(surviving until module exit), with ioctls for queue creation &
management,
and data structures are initialized for managing HSA device topology.
The rest of the initialization is driven by calls from the radeon
kgd at the
following points :
- radeon_init (kfd_init)
- radeon_exit (kfd_fini)
- radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
- radeon_driver_unload_kms (kfd_device_fini)
During the probe and init processing per-device data structures are
established which connect to the associated graphics kernel driver.
This
information is exposed to userspace via sysfs, along with a version
number
allowing userspace to determine if a topology change has occurred
while it
was reading from sysfs.
The interface between kfd and kgd also allows the kfd to request buffer
management services from kgd, and allows kgd to route interrupt
requests to
kfd code since the interrupt block is shared between regular
graphics/compute and HSA compute subsystems in the GPU.
The kfd code works with an open source usermode library
("libhsakmt") which
is in the final stages of IP review and should be published in a
separate
repo over the next few days.
The code operates in one of three modes, selectable via the
sched_policy
module parameter :
- sched_policy=0 uses a hardware scheduler running in the MEC block
within
CP, and allows oversubscription (more queues than HW slots)
- sched_policy=1 also uses HW scheduling but does not allow
oversubscription, so create_queue requests fail when we run out of
HW slots
- sched_policy=2 does not use HW scheduling, so the driver manually
assigns
queues to HW slots by programming registers
The "no HW scheduling" option is for debug & new hardware bringup
only, so
has less test coverage than the other options. Default in the
current code
is "HW scheduling without oversubscription" since that is where we
have the
most test coverage but we expect to change the default to "HW
scheduling
with oversubscription" after further testing. This effectively
removes the
HW limit on the number of work queues available to applications.
Programs running on the GPU are associated with an address space
through the
VMID field, which is translated to a unique PASID at access time via
a set
of 16 VMID-to-PASID mapping registers. The available VMIDs
(currently 16)
are partitioned (under control of the radeon kgd) between current
gfx/compute and HSA compute, with each getting 8 in the current
code. The
VMID-to-PASID mapping registers are updated by the HW scheduler when
used,
and by driver code if HW scheduling is not being used.
The Sea Islands compute queues use a new "doorbell" mechanism
instead of the
earlier kernel-managed write pointer registers. Doorbells use a
separate BAR
dedicated for this purpose, and pages within the doorbell aperture are
mapped to userspace (each page mapped to only one user address space).
Writes to the doorbell aperture are intercepted by GPU hardware,
allowing
userspace code to safely manage work queues (rings) without requiring a
kernel call for every ring update.
First step for an application process is to open the kfd device.
Calls to
open create a kfd "process" structure only for the first thread of the
process. Subsequent open calls are checked to see if they are from
processes
using the same mm_struct and, if so, don't do anything. The kfd
per-process
data lives as long as the mm_struct exists. Each mm_struct is
associated
with a unique PASID, allowing the IOMMUv2 to make userspace process
memory
accessible to the GPU.
Next step is for the application to collect topology information via
sysfs.
This gives userspace enough information to be able to identify specific
nodes (processors) in subsequent queue management calls. Application
processes can create queues on multiple processors, and processors
support
queues from multiple processes.
At this point the application can create work queues in userspace
memory and
pass them through the usermode library to kfd to have them mapped
onto HW
queue slots so that commands written to the queues can be executed
by the
GPU. Queue operations specify a processor node, and so the bulk of
this code
is device-specific.
Written by John Bridgman <John.Bridgman@xxxxxxx>
Alexey Skidanov (1):
amdkfd: Implement the Get Process Aperture IOCTL
Andrew Lewycky (3):
amdkfd: Add basic modules to amdkfd
amdkfd: Add interrupt handling module
amdkfd: Implement the Set Memory Policy IOCTL
Ben Goz (8):
amdkfd: Add queue module
amdkfd: Add mqd_manager module
amdkfd: Add kernel queue module
amdkfd: Add module parameter of scheduling policy
amdkfd: Add packet manager module
amdkfd: Add process queue manager module
amdkfd: Add device queue manager module
amdkfd: Implement the create/destroy/update queue IOCTLs
Evgeny Pinchuk (3):
amdkfd: Add topology module to amdkfd
amdkfd: Implement the Get Clock Counters IOCTL
amdkfd: Implement the PMC Acquire/Release IOCTLs
Oded Gabbay (10):
mm: Add kfd_process pointer to mm_struct
drm/radeon: reduce number of free VMIDs and pipes in KV
drm/radeon/cik: Don't touch int of pipes 1-7
drm/radeon: Report doorbell configuration to amdkfd
drm/radeon: adding synchronization for GRBM GFX
drm/radeon: Add radeon <--> amdkfd interface
Update MAINTAINERS and CREDITS files with amdkfd info
amdkfd: Add IOCTL set definitions of amdkfd
amdkfd: Add amdkfd skeleton driver
amdkfd: Add binding/unbinding calls to amd_iommu driver
CREDITS | 7 +
MAINTAINERS | 10 +
drivers/gpu/drm/radeon/Kconfig | 2 +
drivers/gpu/drm/radeon/Makefile | 3 +
drivers/gpu/drm/radeon/amdkfd/Kconfig | 10 +
drivers/gpu/drm/radeon/amdkfd/Makefile | 14 +
drivers/gpu/drm/radeon/amdkfd/cik_mqds.h | 185 +++
drivers/gpu/drm/radeon/amdkfd/cik_regs.h | 220 ++++
drivers/gpu/drm/radeon/amdkfd/kfd_aperture.c | 123 ++
drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c | 518 +++++++++
drivers/gpu/drm/radeon/amdkfd/kfd_crat.h | 294 +++++
drivers/gpu/drm/radeon/amdkfd/kfd_device.c | 254 ++++
.../drm/radeon/amdkfd/kfd_device_queue_manager.c | 985
++++++++++++++++
.../drm/radeon/amdkfd/kfd_device_queue_manager.h | 101 ++
drivers/gpu/drm/radeon/amdkfd/kfd_doorbell.c | 264 +++++
drivers/gpu/drm/radeon/amdkfd/kfd_interrupt.c | 161 +++
drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.c | 305 +++++
drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.h | 66 ++
drivers/gpu/drm/radeon/amdkfd/kfd_module.c | 131 +++
drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.c | 291 +++++
drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.h | 54 +
drivers/gpu/drm/radeon/amdkfd/kfd_packet_manager.c | 488 ++++++++
drivers/gpu/drm/radeon/amdkfd/kfd_pasid.c | 97 ++
drivers/gpu/drm/radeon/amdkfd/kfd_pm4_headers.h | 682 +++++++++++
drivers/gpu/drm/radeon/amdkfd/kfd_pm4_opcodes.h | 107 ++
drivers/gpu/drm/radeon/amdkfd/kfd_priv.h | 466 ++++++++
drivers/gpu/drm/radeon/amdkfd/kfd_process.c | 405 +++++++
.../drm/radeon/amdkfd/kfd_process_queue_manager.c | 343 ++++++
drivers/gpu/drm/radeon/amdkfd/kfd_queue.c | 109 ++
drivers/gpu/drm/radeon/amdkfd/kfd_topology.c | 1207
++++++++++++++++++++
drivers/gpu/drm/radeon/amdkfd/kfd_topology.h | 168 +++
drivers/gpu/drm/radeon/amdkfd/kfd_vidmem.c | 96 ++
drivers/gpu/drm/radeon/cik.c | 154 +--
drivers/gpu/drm/radeon/cik_reg.h | 65 ++
drivers/gpu/drm/radeon/cikd.h | 51 +-
drivers/gpu/drm/radeon/radeon.h | 9 +
drivers/gpu/drm/radeon/radeon_device.c | 32 +
drivers/gpu/drm/radeon/radeon_drv.c | 5 +
drivers/gpu/drm/radeon/radeon_kfd.c | 566 +++++++++
drivers/gpu/drm/radeon/radeon_kfd.h | 119 ++
drivers/gpu/drm/radeon/radeon_kms.c | 7 +
include/linux/mm_types.h | 14 +
include/uapi/linux/kfd_ioctl.h | 133 +++
43 files changed, 9226 insertions(+), 95 deletions(-)
create mode 100644 drivers/gpu/drm/radeon/amdkfd/Kconfig
create mode 100644 drivers/gpu/drm/radeon/amdkfd/Makefile
create mode 100644 drivers/gpu/drm/radeon/amdkfd/cik_mqds.h
create mode 100644 drivers/gpu/drm/radeon/amdkfd/cik_regs.h
create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_aperture.c
create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c
create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_crat.h
create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_device.c
create mode 100644
drivers/gpu/drm/radeon/amdkfd/kfd_device_queue_manager.c
create mode 100644
drivers/gpu/drm/radeon/amdkfd/kfd_device_queue_manager.h
create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_doorbell.c
create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_interrupt.c
create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.c
create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.h
create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_module.c
create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.c
create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.h
create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_packet_manager.c
create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_pasid.c
create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_pm4_headers.h
create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_pm4_opcodes.h
create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_process.c
create mode 100644
drivers/gpu/drm/radeon/amdkfd/kfd_process_queue_manager.c
create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_queue.c
create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_topology.c
create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_topology.h
create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_vidmem.c
create mode 100644 drivers/gpu/drm/radeon/radeon_kfd.c
create mode 100644 drivers/gpu/drm/radeon/radeon_kfd.h
create mode 100644 include/uapi/linux/kfd_ioctl.h
--
1.9.1