[AMD Official Use Only - General] > -----Original Message----- > From: Kuehling, Felix <Felix.Kuehling@xxxxxxx> > Sent: Monday, March 20, 2023 8:31 PM > To: Kim, Jonathan <Jonathan.Kim@xxxxxxx>; amd- > gfx@xxxxxxxxxxxxxxxxxxxxx; dri-devel@xxxxxxxxxxxxxxxxxxxxx > Subject: Re: [PATCH 19/32] drm/amdkfd: add runtime enable operation > > > On 2023-01-25 14:53, Jonathan Kim wrote: > > The debugger can attach to a process prior to HSA enablement (i.e. > > inferior is spawned by the debugger and attached to immediately before > > target process has been enabled for HSA dispatches) or it > > can attach to a running target that is already HSA enabled. Either > > way, the debugger needs to know the enablement status to know when > > it can inspect queues. > > > > For the scenario where the debugger spawns the target process, > > it will have to wait for ROCr's runtime enable request from the target. > > The runtime enable request will be able to see that its process has been > > debug attached. ROCr raises an EC_PROCESS_RUNTIME signal to the > > debugger then blocks the target process while waiting the debugger's > > response. Once the debugger has received the runtime signal, it will > > unblock the target process. > > > > For the scenario where the debugger attaches to a running target > > process, ROCr will set the target process' runtime status as enabled so > > that on an attach request, the debugger will be able to see this > > status and will continue with debug enablement as normal. > > > > A secondary requirement is to conditionally enable the trap tempories > only > > if the user requests it (env var HSA_ENABLE_DEBUG=1) or if the debugger > > attaches with HSA runtime enabled. This is because setting up the trap > > temporaries incurs a performance overhead that is unacceptable for > > microbench performance in normal mode for certain customers. > > > > In the scenario where the debugger spawns the target process, when ROCr > > detects that the debugger has attached during the runtime enable > > request, it will enable the trap temporaries before it blocks the target > > process while waiting for the debugger to respond. > > > > In the scenario where the debugger attaches to a running target process, > > it will enable to trap temporaries itself. > > > > Finally, there is an additional restriction that is required to be > > enforced with runtime enable and HW debug mode setting. The debugger > must > > first ensure that HW debug mode has been enabled before permitting HW > debug > > mode operations. > > > > With single process debug devices, allowing the debugger to set debug > > HW modes prior to trap activation means that debug HW mode setting can > > occur before the KFD has reserved the debug VMID (0xf) from the hardware > > scheduler's VMID allocation resource pool. This can result in the > > hardware scheduler assigning VMID 0xf to a non-debugged process and > > having that process inherit debug HW mode settings intended for the > > debugged target process instead, which is both incorrect and potentially > > fatal for normal mode operation. > > > > With multi process debug devices, allowing the debugger to set debug > > HW modes prior to trap activation means that non-debugged processes > > migrating to a new VMID could inherit unintended debug settings. > > > > All debug operations that touch HW settings must require trap activation > > where trap activation is triggered by both debug attach and runtime > > enablement (target has KFD opened and is ready to dispatch work). > > > > v2: fix up hierarchy of semantics in description. > > > > Signed-off-by: Jonathan Kim <jonathan.kim@xxxxxxx> > > --- > > drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 150 > ++++++++++++++++++++++- > > drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 6 +- > > drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 4 + > > drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 1 + > > 4 files changed, 157 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c > b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c > > index 09fe8576dc8c..46f9d453dc5e 100644 > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c > > @@ -2654,11 +2654,147 @@ static int kfd_ioctl_criu(struct file *filep, > struct kfd_process *p, void *data) > > return ret; > > } > > > > -static int kfd_ioctl_runtime_enable(struct file *filep, struct kfd_process *p, > void *data) > > +static int runtime_enable(struct kfd_process *p, uint64_t r_debug, > > + bool enable_ttmp_setup) > > { > > + int i = 0, ret = 0; > > + > > + if (p->is_runtime_retry) > > + goto retry; > > + > > + if (p->runtime_info.runtime_state != > DEBUG_RUNTIME_STATE_DISABLED) > > + return -EBUSY; > > + > > + for (i = 0; i < p->n_pdds; i++) { > > + struct kfd_process_device *pdd = p->pdds[i]; > > + > > + if (pdd->qpd.queue_count) > > + return -EEXIST; > > + } > > + > > + p->runtime_info.runtime_state = > DEBUG_RUNTIME_STATE_ENABLED; > > + p->runtime_info.r_debug = r_debug; > > + p->runtime_info.ttmp_setup = enable_ttmp_setup; > > + > > + if (p->runtime_info.ttmp_setup) { > > + for (i = 0; i < p->n_pdds; i++) { > > + struct kfd_process_device *pdd = p->pdds[i]; > > + > > + if (!kfd_dbg_is_rlc_restore_supported(pdd->dev)) { > > + amdgpu_gfx_off_ctrl(pdd->dev->adev, false); > > + pdd->dev->kfd2kgd->enable_debug_trap( > > + pdd->dev->adev, > > + true, > > + pdd->dev- > >vm_info.last_vmid_kfd); > > + } > > + > > + if (kfd_dbg_is_per_vmid_supported(pdd->dev)) { > > Should this be else-if? It seems weird that enable_debug_trap could be > called twice in a row. If RLC restore is only applicable on > single-process debug devices, then maybe put the per-VMID case first. > > > > + pdd->spi_dbg_override = pdd->dev->kfd2kgd- > >enable_debug_trap( > > + pdd->dev->adev, > > + false, > > + pdd->dev- > >vm_info.last_vmid_kfd); > > + > > + if (!pdd->dev- > >shared_resources.enable_mes) > > + debug_refresh_runlist(pdd->dev- > >dqm); > > + else > > + kfd_dbg_set_mes_debug_mode(pdd); > > Do we really need to update the runlist here? When the runtime gets > enabled, there are no queues yet for the process. So there should be no > change to the runlist until the process creates its first queue. Acked. Yeah this seems unnecessary and can be dropped. > > > > + } > > + } > > + } > > + > > +retry: > > + if (p->debug_trap_enabled) { > > + if (!p->is_runtime_retry) { > > + kfd_dbg_trap_activate(p); > > + > kfd_dbg_ev_raise(KFD_EC_MASK(EC_PROCESS_RUNTIME), > > + p, NULL, 0, false, NULL, 0); > > + } > > + > > + mutex_unlock(&p->mutex); > > + ret = down_interruptible(&p->runtime_enable_sema); > > + mutex_lock(&p->mutex); > > + > > + p->is_runtime_retry = !!ret; > > + } > > + > > + return ret; > > +} > > + > > +static int runtime_disable(struct kfd_process *p) > > +{ > > + int i = 0, ret; > > + bool was_enabled = p->runtime_info.runtime_state == > DEBUG_RUNTIME_STATE_ENABLED; > > + > > + p->runtime_info.runtime_state = > DEBUG_RUNTIME_STATE_DISABLED; > > + p->runtime_info.r_debug = 0; > > + > > + if (p->debug_trap_enabled) { > > + if (was_enabled) > > + kfd_dbg_trap_deactivate(p, false, 0); > > Does this call kfd_dbg_trap_deactivate multiple times on retry? Is that > a problem? I don't think dbg_trap_deactivate gets called again on retry. Prior to the deactivate call, the per-process runtime state gets saved as DEBUG_RUNTIME_STATE_DISABLED and early returns on a down_interruptible error. If the caller retries on the error, this means was_enabled is false so we never deactivate again. Thanks, Jon > > Regards, > Felix > > > > + > > + if (!p->is_runtime_retry) > > + > kfd_dbg_ev_raise(KFD_EC_MASK(EC_PROCESS_RUNTIME), > > + p, NULL, 0, false, NULL, 0); > > + > > + mutex_unlock(&p->mutex); > > + ret = down_interruptible(&p->runtime_enable_sema); > > + mutex_lock(&p->mutex); > > + > > + p->is_runtime_retry = !!ret; > > + if (ret) > > + return ret; > > + } > > + > > + if (was_enabled && p->runtime_info.ttmp_setup) { > > + for (i = 0; i < p->n_pdds; i++) { > > + struct kfd_process_device *pdd = p->pdds[i]; > > + > > + if (!kfd_dbg_is_rlc_restore_supported(pdd->dev)) > > + amdgpu_gfx_off_ctrl(pdd->dev->adev, true); > > + } > > + } > > + > > + p->runtime_info.ttmp_setup = false; > > + > > + /* disable DISPATCH_PTR save */ > > + for (i = 0; i < p->n_pdds; i++) { > > + struct kfd_process_device *pdd = p->pdds[i]; > > + > > + if (kfd_dbg_is_per_vmid_supported(pdd->dev)) { > > + pdd->spi_dbg_override = > > + pdd->dev->kfd2kgd- > >disable_debug_trap( > > + pdd->dev->adev, > > + false, > > + pdd->dev->vm_info.last_vmid_kfd); > > + > > + if (!pdd->dev->shared_resources.enable_mes) > > + debug_refresh_runlist(pdd->dev->dqm); > > + else > > + kfd_dbg_set_mes_debug_mode(pdd); > > + } > > + } > > + > > return 0; > > } > > > > +static int kfd_ioctl_runtime_enable(struct file *filep, struct kfd_process > *p, void *data) > > +{ > > + struct kfd_ioctl_runtime_enable_args *args = data; > > + int r; > > + > > + mutex_lock(&p->mutex); > > + > > + if (args->mode_mask & > KFD_RUNTIME_ENABLE_MODE_ENABLE_MASK) > > + r = runtime_enable(p, args->r_debug, > > + !!(args->mode_mask & > KFD_RUNTIME_ENABLE_MODE_TTMP_SAVE_MASK)); > > + else > > + r = runtime_disable(p); > > + > > + mutex_unlock(&p->mutex); > > + > > + return r; > > +} > > + > > static int kfd_ioctl_set_debug_trap(struct file *filep, struct kfd_process > *p, void *data) > > { > > struct kfd_ioctl_dbg_trap_args *args = data; > > @@ -2720,6 +2856,18 @@ static int kfd_ioctl_set_debug_trap(struct file > *filep, struct kfd_process *p, v > > goto unlock_out; > > } > > > > + if (target->runtime_info.runtime_state != > DEBUG_RUNTIME_STATE_ENABLED && > > + (args->op == > KFD_IOC_DBG_TRAP_SET_WAVE_LAUNCH_OVERRIDE || > > + args->op == > KFD_IOC_DBG_TRAP_SET_WAVE_LAUNCH_MODE || > > + args->op == KFD_IOC_DBG_TRAP_SUSPEND_QUEUES > || > > + args->op == KFD_IOC_DBG_TRAP_RESUME_QUEUES > || > > + args->op == > KFD_IOC_DBG_TRAP_SET_NODE_ADDRESS_WATCH || > > + args->op == > KFD_IOC_DBG_TRAP_CLEAR_NODE_ADDRESS_WATCH || > > + args->op == KFD_IOC_DBG_TRAP_SET_FLAGS)) { > > + r = -EPERM; > > + goto unlock_out; > > + } > > + > > switch (args->op) { > > case KFD_IOC_DBG_TRAP_ENABLE: > > if (target != p) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c > b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c > > index 4174b479ea6f..47f8425a0db3 100644 > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c > > @@ -220,7 +220,7 @@ static int kfd_dbg_set_workaround(struct > kfd_process *target, bool enable) > > return r; > > } > > > > -static int kfd_dbg_set_mes_debug_mode(struct kfd_process_device *pdd) > > +int kfd_dbg_set_mes_debug_mode(struct kfd_process_device *pdd) > > { > > uint32_t spi_dbg_cntl = pdd->spi_dbg_override | pdd- > >spi_dbg_launch_mode; > > uint32_t flags = pdd->process->dbg_flags; > > @@ -240,7 +240,7 @@ static int kfd_dbg_set_mes_debug_mode(struct > kfd_process_device *pdd) > > * to unwind > > * else: ignored > > */ > > -static void kfd_dbg_trap_deactivate(struct kfd_process *target, bool > unwind, int unwind_count) > > +void kfd_dbg_trap_deactivate(struct kfd_process *target, bool unwind, > int unwind_count) > > { > > int i, count = 0; > > > > @@ -311,7 +311,7 @@ int kfd_dbg_trap_disable(struct kfd_process > *target) > > return 0; > > } > > > > -static int kfd_dbg_trap_activate(struct kfd_process *target) > > +int kfd_dbg_trap_activate(struct kfd_process *target) > > { > > int i, r = 0, unwind_count = 0; > > > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h > b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h > > index fefb9dc5cf69..22707f7a2368 100644 > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h > > @@ -28,6 +28,8 @@ > > void kgd_gfx_v9_set_wave_launch_stall(struct amdgpu_device *adev, > > uint32_t vmid, > > bool stall); > > +void kfd_dbg_trap_deactivate(struct kfd_process *target, bool unwind, > int unwind_count); > > +int kfd_dbg_trap_activate(struct kfd_process *target); > > bool kfd_dbg_ev_raise(uint64_t event_mask, > > struct kfd_process *process, struct kfd_dev *dev, > > unsigned int source_id, bool use_worker, > > @@ -80,4 +82,6 @@ static inline bool kfd_dbg_has_gws_support(struct > kfd_dev *dev) > > /* Assume debugging and cooperative launch supported otherwise. > */ > > return true; > > } > > + > > +int kfd_dbg_set_mes_debug_mode(struct kfd_process_device *pdd); > > #endif > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h > b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h > > index 4cb433a21e3d..63c59ad2a4ca 100644 > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h > > @@ -946,6 +946,7 @@ struct kfd_process { > > > > /* Tracks runtime enable status */ > > struct semaphore runtime_enable_sema; > > + bool is_runtime_retry; > > struct kfd_runtime_info runtime_info; > > > > };