[AMD Official Use Only - General] >> Also what is the check of s_resubmit state good for here? > The resubmission happens on the low priority ring shed task. Thinking of the situation that two high prio ib requests from high priority calling amdgpu_mcbp_scan, we don’t want to trigger preemption twice when low priority ring has not resubmitted packages. >Mhm, I'm not sure if we should complicate things like this and rather just use a fixed low and high priority ring. >So the flag basically means that some low priority work as still preempted and waits for re-submission? Yes, the re-submission happens the next time there is low priority ib to emit or in timeout-fallback if it is the last low priority ib. Thanks, Jiadong -----Original Message----- From: Koenig, Christian <Christian.Koenig@xxxxxxx> Sent: Friday, September 23, 2022 7:23 PM To: Zhu, Jiadong <Jiadong.Zhu@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx Cc: Tuikov, Luben <Luben.Tuikov@xxxxxxx>; Grodzovsky, Andrey <Andrey.Grodzovsky@xxxxxxx> Subject: Re: [PATCH 4/5] drm/amdgpu: Implement OS triggered MCBP (v5) Am 23.09.22 um 12:51 schrieb Zhu, Jiadong: > [AMD Official Use Only - General] > >>> + return need_preempt && !mux->s_resubmit; >>> Well what exactly are you trying to do here? Finding if a lower priority ring has unsignaled fences? >> Yes, we are peeking the fence_drv data at the time high priority ibs are going to emit. The result is not necessarily accurate because we would check the fence after preemption complete. >> Please use amdgpu_fence_count_emitted() for this instead. > amdgpu_fence_count_emitted calls amdgpu_fence_process in it. We are in high priority ring schedule task while calling amdgpu_mcbp_scan, amdgpu_fence_process on both rings might lower the performance. > Maybe we could add a function in amdgpu_fence.c to count_emmited without amdgpu_fence_process? Good point. This also not only lowers the performance, but is problematic for correctness since dma_fence won't signal from the interrupt handler any more. Yeah, feel free to remove the call to amdgpu_fence_process() from amdgpu_fence_count_emitted() in a separate patch. > >> Also what is the check of s_resubmit state good for here? > The resubmission happens on the low priority ring shed task. Thinking of the situation that two high prio ib requests from high priority calling amdgpu_mcbp_scan, we don’t want to trigger preemption twice when low priority ring has not resubmitted packages. Mhm, I'm not sure if we should complicate things like this and rather just use a fixed low and high priority ring. So the flag basically means that some low priority work as still preempted and waits for re-submission? Regards, Christian. > > Thanks, > Jiadong > > -----Original Message----- > From: Koenig, Christian <Christian.Koenig@xxxxxxx> > Sent: Friday, September 23, 2022 6:13 PM > To: Zhu, Jiadong <Jiadong.Zhu@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx > Cc: Tuikov, Luben <Luben.Tuikov@xxxxxxx>; Grodzovsky, Andrey > <Andrey.Grodzovsky@xxxxxxx> > Subject: Re: [PATCH 4/5] drm/amdgpu: Implement OS triggered MCBP (v5) > > Am 23.09.22 um 11:24 schrieb Zhu, Jiadong: >> [AMD Official Use Only - General] >> >> Inlined. >> >> Thanks, >> Jiadong >> >> -----Original Message----- >> From: Koenig, Christian <Christian.Koenig@xxxxxxx> >> Sent: Wednesday, September 21, 2022 9:01 PM >> To: Zhu, Jiadong <Jiadong.Zhu@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx >> Cc: Tuikov, Luben <Luben.Tuikov@xxxxxxx>; Grodzovsky, Andrey >> <Andrey.Grodzovsky@xxxxxxx> >> Subject: Re: [PATCH 4/5] drm/amdgpu: Implement OS triggered MCBP (v5) >> >> Am 21.09.22 um 11:41 schrieb jiadong.zhu@xxxxxxx: >>> From: "Jiadong.Zhu" <Jiadong.Zhu@xxxxxxx> >>> >>> Trigger Mid-Command Buffer Preemption according to the priority of >>> the software rings and the hw fence signalling condition. >>> >>> The muxer saves the locations of the indirect buffer frames from the >>> software ring together with the fence sequence number in its fifo >>> queue, and pops out those records when the fences are signalled. The >>> locations are used to resubmit packages in preemption scenarios by coping the chunks from the software ring. >> Maybe change the subject a bit. The MCBP is not really triggered by the core Linux kernel. >> >> Maybe write instead "MCBP based on DRM scheduler". >> >>> v2: Update comment style. >>> v3: Fix conflict caused by previous modifications. >>> v4: Remove unnecessary prints. >>> v5: Fix corner cases for resubmission cases. >>> >>> Cc: Christian Koenig <Christian.Koenig@xxxxxxx> >>> Cc: Luben Tuikov <Luben.Tuikov@xxxxxxx> >>> Cc: Andrey Grodzovsky <Andrey.Grodzovsky@xxxxxxx> >>> Acked-by: Luben Tuikov <luben.tuikov@xxxxxxx> >>> Signed-off-by: Jiadong.Zhu <Jiadong.Zhu@xxxxxxx> >>> --- >>> drivers/gpu/drm/amd/amdgpu/Makefile | 2 +- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 2 + >>> drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.c | 91 +++++++++ >>> drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.h | 29 +++ >>> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 12 ++ >>> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 3 + >>> drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 186 ++++++++++++++++++- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h | 24 +++ >>> drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c | 27 +++ >>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 + >>> 10 files changed, 372 insertions(+), 6 deletions(-) >>> create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.c >>> create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.h >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile >>> b/drivers/gpu/drm/amd/amdgpu/Makefile >>> index 85224bc81ce5..24c5aa19bbf2 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/Makefile >>> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile >>> @@ -59,7 +59,7 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \ >>> amdgpu_umc.o smu_v11_0_i2c.o amdgpu_fru_eeprom.o amdgpu_rap.o \ >>> amdgpu_fw_attestation.o amdgpu_securedisplay.o \ >>> amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o \ >>> - amdgpu_sw_ring.o amdgpu_ring_mux.o >>> + amdgpu_sw_ring.o amdgpu_ring_mux.o amdgpu_mcbp.o >> This functionality is spread over to many files. Probably better to move this into the amdgpu_ring_mux.c as well. >> >>> amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c >>> index 258cffe3c06a..af86d87e2f3b 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c >>> @@ -211,6 +211,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs, >>> } >>> } >>> >>> + amdgpu_ring_ib_begin(ring); >>> if (job && ring->funcs->init_cond_exec) >>> patch_offset = amdgpu_ring_init_cond_exec(ring); >>> >>> @@ -285,6 +286,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs, >>> ring->hw_prio == AMDGPU_GFX_PIPE_PRIO_HIGH) >>> ring->funcs->emit_wave_limit(ring, false); >>> >>> + amdgpu_ring_ib_end(ring); >>> amdgpu_ring_commit(ring); >>> return 0; >>> } >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.c >>> new file mode 100644 >>> index 000000000000..121b1a4e0f04 >>> --- /dev/null >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.c >>> @@ -0,0 +1,91 @@ >>> +/* >>> + * Copyright 2022 Advanced Micro Devices, Inc. >>> + * >>> + * Permission is hereby granted, free of charge, to any person >>> +obtaining a >>> + * copy of this software and associated documentation files (the >>> +"Software"), >>> + * to deal in the Software without restriction, including without >>> +limitation >>> + * the rights to use, copy, modify, merge, publish, distribute, >>> +sublicense, >>> + * and/or sell copies of the Software, and to permit persons to >>> +whom the >>> + * Software is furnished to do so, subject to the following conditions: >>> + * >>> + * The above copyright notice and this permission notice shall be >>> +included in >>> + * all copies or substantial portions of the Software. >>> + * >>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, >>> +EXPRESS OR >>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF >>> +MERCHANTABILITY, >>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO >>> +EVENT SHALL >>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, >>> +DAMAGES OR >>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR >>> +OTHERWISE, >>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE >>> +USE OR >>> + * OTHER DEALINGS IN THE SOFTWARE. >>> + * >>> + */ >>> + >>> +#include <linux/delay.h> >>> +#include <linux/kernel.h> >>> +#include <linux/firmware.h> >>> +#include <linux/module.h> >>> +#include <linux/pci.h> >>> +#include <drm/gpu_scheduler.h> >>> + >>> +#include "amdgpu.h" >>> +#include "amdgpu_mcbp.h" >>> +#include "amdgpu_ring.h" >>> + >>> +/* Trigger Mid-Command Buffer Preemption (MCBP) and find if we need >>> +to resubmit. */ int amdgpu_mcbp_trigger_preempt(struct >>> +amdgpu_ring_mux *mux) { >>> + struct amdgpu_mux_entry *e; >>> + struct amdgpu_ring *ring = NULL; >>> + int i; >>> + >>> + spin_lock(&mux->lock); >>> + >>> + amdgpu_ring_preempt_ib(mux->real_ring); >>> + >>> + for (i = 0; i < mux->num_ring_entries; i++) { >>> + e = &mux->ring_entry[i]; >>> + if (e->ring->hw_prio <= AMDGPU_RING_PRIO_DEFAULT) { >>> + ring = e->ring; >>> + break; >>> + } >>> + } >>> + >>> + if (!ring) { >>> + DRM_ERROR("cannot find low priority ring\n"); >>> + spin_unlock(&mux->lock); >>> + return -ENOENT; >>> + } >>> + >>> + amdgpu_fence_process(ring); >>> That's usually and extremely bad idea since fence processing should only be kicked of from the interrupt handler. >>> Otherwise you have interrupt handler and this thread here racing to signaling fences. >> We have to block low priority ibs copied to the real ring and check the seq no signaled here. >> I will refactor to use the trailing fence irq to handle this. >> >>> + >>> + if (atomic_read(&ring->fence_drv.last_seq) != >>> + ring->fence_drv.sync_seq) { >>> + mux->s_resubmit = true; >>> + mux->seq_no_resubmit = ring->fence_drv.sync_seq; >>> Don't touch any fence handling internals here. If you need to know >>> which fences are signaled and which aren't look into amdgpu_fence.c >> I would use amdgpu_fence_count_emitted in irq hander to meet this. >> >> >>> + amdgpu_ring_mux_schedule_resubmit(mux); >>> + } >>> + >>> + spin_unlock(&mux->lock); >>> + return 0; >>> +} >>> + >>> +/*scan on low prio rings to have unsignaled fence and high ring has >>> +no fence.*/ >> What exactly should that comment mean? >> >>> +int amdgpu_mcbp_scan(struct amdgpu_ring_mux *mux) { >>> + struct amdgpu_ring *ring; >>> + uint32_t seq, last_seq; >>> + int i, need_preempt; >>> + >>> + need_preempt = 0; >>> + for (i = 0; i < mux->num_ring_entries; i++) { >>> + ring = mux->ring_entry[i].ring; >>> + last_seq = atomic_read(&ring->fence_drv.last_seq); >>> + seq = READ_ONCE(ring->fence_drv.sync_seq); >>> + if (ring->hw_prio > AMDGPU_RING_PRIO_DEFAULT && last_seq < seq) >>> + return 0; >>> + if (ring->hw_prio <= AMDGPU_RING_PRIO_DEFAULT && last_seq < seq) >>> + need_preempt = 1; >>> + } >>> + return need_preempt && !mux->s_resubmit; >>> Well what exactly are you trying to do here? Finding if a lower priority ring has unsignaled fences? >> Yes, we are peeking the fence_drv data at the time high priority ibs are going to emit. The result is not necessarily accurate because we would check the fence after preemption complete. > Please use amdgpu_fence_count_emitted() for this instead. > > Also what is the check of s_resubmit state good for here? > > Regards, > Christian. > >> Regards, >> Christian. >> >>> +} >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.h >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.h >>> new file mode 100644 >>> index 000000000000..0033bcba8d03 >>> --- /dev/null >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.h >>> @@ -0,0 +1,29 @@ >>> +/* >>> + * Copyright 2022 Advanced Micro Devices, Inc. >>> + * >>> + * Permission is hereby granted, free of charge, to any person >>> +obtaining a >>> + * copy of this software and associated documentation files (the >>> +"Software"), >>> + * to deal in the Software without restriction, including without >>> +limitation >>> + * the rights to use, copy, modify, merge, publish, distribute, >>> +sublicense, >>> + * and/or sell copies of the Software, and to permit persons to >>> +whom the >>> + * Software is furnished to do so, subject to the following conditions: >>> + * >>> + * The above copyright notice and this permission notice shall be >>> +included in >>> + * all copies or substantial portions of the Software. >>> + * >>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, >>> +EXPRESS OR >>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF >>> +MERCHANTABILITY, >>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO >>> +EVENT SHALL >>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, >>> +DAMAGES OR >>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR >>> +OTHERWISE, >>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE >>> +USE OR >>> + * OTHER DEALINGS IN THE SOFTWARE. >>> + * >>> + */ >>> + >>> +#ifndef __AMDGPU_MCBP_H__ >>> +#define __AMDGPU_MCBP_H__ >>> + >>> +int amdgpu_mcbp_trigger_preempt(struct amdgpu_ring_mux *mux); int >>> +amdgpu_mcbp_scan(struct amdgpu_ring_mux *mux); #endif >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c >>> index 4eaf3bd332f7..94362c39b73e 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c >>> @@ -574,3 +574,15 @@ int amdgpu_ring_init_mqd(struct amdgpu_ring >>> *ring) >>> >>> return mqd_mgr->init_mqd(adev, ring->mqd_ptr, &prop); >>> } >>> + >>> +void amdgpu_ring_ib_begin(struct amdgpu_ring *ring) { >>> + if (ring->is_sw_ring) >>> + amdgpu_sw_ring_ib_begin(ring); } >>> + >>> +void amdgpu_ring_ib_end(struct amdgpu_ring *ring) { >>> + if (ring->is_sw_ring) >>> + amdgpu_sw_ring_ib_end(ring); } >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h >>> index aeb48cc3666c..36726c28a806 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h >>> @@ -312,6 +312,9 @@ struct amdgpu_ring { >>> #define amdgpu_ring_preempt_ib(r) (r)->funcs->preempt_ib(r) >>> >>> int amdgpu_ring_alloc(struct amdgpu_ring *ring, unsigned ndw); >>> +void amdgpu_ring_ib_begin(struct amdgpu_ring *ring); void >>> +amdgpu_ring_ib_end(struct amdgpu_ring *ring); >>> + >>> void amdgpu_ring_insert_nop(struct amdgpu_ring *ring, uint32_t count); >>> void amdgpu_ring_generic_pad_ib(struct amdgpu_ring *ring, struct amdgpu_ib *ib); >>> void amdgpu_ring_commit(struct amdgpu_ring *ring); diff --git >>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c >>> index d6b30db27104..70dd725432d4 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c >>> @@ -24,30 +24,59 @@ >>> #include <drm/drm_print.h> >>> >>> #include "amdgpu_ring_mux.h" >>> +#include "amdgpu_mcbp.h" >>> #include "amdgpu_ring.h" >>> >>> #define AMDGPU_MUX_RESUBMIT_JIFFIES_TIMEOUT (HZ / 2) >>> >>> +static struct kmem_cache *amdgpu_mux_chunk_slab; >>> + >>> static void copy_pkt_from_sw_ring(struct amdgpu_ring_mux *mux, struct amdgpu_ring *ring, >>> u64 s_start, u64 s_end); >>> +static void amdgpu_mux_resubmit_chunks(struct amdgpu_ring_mux *mux, >>> +bool is_fallback); static void amdgpu_mux_resubmit_fallback(struct >>> +timer_list *t); >>> >>> int amdgpu_ring_mux_init(struct amdgpu_ring_mux *mux, struct amdgpu_ring *ring, >>> unsigned int entry_size) >>> { >>> mux->real_ring = ring; >>> mux->num_ring_entries = 0; >>> + >>> mux->ring_entry = kcalloc(entry_size, sizeof(struct amdgpu_mux_entry), GFP_KERNEL); >>> if (!mux->ring_entry) >>> return -ENOMEM; >>> >>> mux->ring_entry_size = entry_size; >>> + mux->s_resubmit = false; >>> + >>> + amdgpu_mux_chunk_slab = kmem_cache_create("amdgpu_mux_chunk", >>> + sizeof(struct amdgpu_mux_chunk), 0, >>> + SLAB_HWCACHE_ALIGN, NULL); >>> + if (!amdgpu_mux_chunk_slab) { >>> + DRM_ERROR("create amdgpu_mux_chunk cache failed\n"); >>> + return -ENOMEM; >>> + } >>> + >>> spin_lock_init(&mux->lock); >>> + timer_setup(&mux->resubmit_timer, >>> + amdgpu_mux_resubmit_fallback, 0); >>> >>> return 0; >>> } >>> >>> void amdgpu_ring_mux_fini(struct amdgpu_ring_mux *mux) >>> { >>> + struct amdgpu_mux_entry *e; >>> + struct amdgpu_mux_chunk *chunk, *chunk2; >>> + int i; >>> + >>> + for (i = 0; i < mux->num_ring_entries; i++) { >>> + e = &mux->ring_entry[i]; >>> + list_for_each_entry_safe(chunk, chunk2, &e->list, entry) { >>> + list_del(&chunk->entry); >>> + kmem_cache_free(amdgpu_mux_chunk_slab, chunk); >>> + } >>> + } >>> + kmem_cache_destroy(amdgpu_mux_chunk_slab); >>> kfree(mux->ring_entry); >>> mux->ring_entry = NULL; >>> mux->num_ring_entries = 0; >>> @@ -67,6 +96,7 @@ int amdgpu_ring_mux_add_sw_ring(struct amdgpu_ring_mux *mux, struct amdgpu_ring >>> ring->entry_index = mux->num_ring_entries; >>> e->ring = ring; >>> >>> + INIT_LIST_HEAD(&e->list); >>> mux->num_ring_entries += 1; >>> return 0; >>> } >>> @@ -82,6 +112,9 @@ void amdgpu_ring_set_wptr_to_mux(struct amdgpu_ring_mux *mux, struct amdgpu_ring >>> { >>> struct amdgpu_mux_entry *e; >>> >>> + if (ring->hw_prio <= AMDGPU_RING_PRIO_DEFAULT) >>> + amdgpu_mux_resubmit_chunks(mux, false); >>> + >>> e = amdgpu_get_sw_entry(mux, ring); >>> if (!e) { >>> DRM_ERROR("cannot find entry for sw ring\n"); @@ >>> -90,13 +123,19 @@ void amdgpu_ring_set_wptr_to_mux(struct >>> amdgpu_ring_mux *mux, struct amdgpu_ring >>> >>> spin_lock(&mux->lock); >>> e->sw_cptr = e->sw_wptr; >>> + if (ring->hw_prio <= AMDGPU_RING_PRIO_DEFAULT && e->sw_cptr < mux->wptr_resubmit) >>> + e->sw_cptr = mux->wptr_resubmit; >>> e->sw_wptr = wptr; >>> e->start_ptr_in_hw_ring = mux->real_ring->wptr; >>> >>> - copy_pkt_from_sw_ring(mux, ring, e->sw_cptr, wptr); >>> - e->end_ptr_in_hw_ring = mux->real_ring->wptr; >>> - amdgpu_ring_commit(mux->real_ring); >>> - >>> + /* donnot copy the ibs which have been resubmitted*/ >>> + if (ring->hw_prio > AMDGPU_RING_PRIO_DEFAULT || mux->wptr_resubmit < wptr) { >>> + copy_pkt_from_sw_ring(mux, ring, e->sw_cptr, wptr); >>> + e->end_ptr_in_hw_ring = mux->real_ring->wptr; >>> + amdgpu_ring_commit(mux->real_ring); >>> + } else { >>> + e->end_ptr_in_hw_ring = mux->real_ring->wptr; >>> + } >>> spin_unlock(&mux->lock); >>> } >>> >>> @@ -159,7 +198,7 @@ u64 amdgpu_ring_get_rptr_from_mux(struct amdgpu_ring_mux *mux, struct amdgpu_rin >>> return e->sw_rptr; >>> } >>> >>> -/* copy packages on sw ring range[begin, end) */ >>> +/* copy packages on sw ring range[start, end) */ >>> static void copy_pkt_from_sw_ring(struct amdgpu_ring_mux *mux, struct amdgpu_ring *ring, >>> u64 s_start, u64 s_end) >>> { >>> @@ -183,3 +222,140 @@ static void copy_pkt_from_sw_ring(struct amdgpu_ring_mux *mux, struct amdgpu_rin >>> amdgpu_ring_write_multiple(real_ring, (void *)&ring->ring[start], end - start); >>> } >>> } >>> + >>> +void amdgpu_ring_mux_schedule_resubmit(struct amdgpu_ring_mux *mux) { >>> + mod_timer(&mux->resubmit_timer, jiffies + >>> +AMDGPU_MUX_RESUBMIT_JIFFIES_TIMEOUT); >>> +} >>> + >>> +void amdgpu_ring_mux_start_ib(struct amdgpu_ring_mux *mux, struct >>> +amdgpu_ring *ring) { >>> + struct amdgpu_mux_entry *e; >>> + struct amdgpu_mux_chunk *chunk; >>> + >>> + amdgpu_mux_resubmit_chunks(mux, false); >>> + >>> + e = amdgpu_get_sw_entry(mux, ring); >>> + if (!e) { >>> + DRM_ERROR("cannot find entry!\n"); >>> + return; >>> + } >>> + >>> + chunk = kmem_cache_alloc(amdgpu_mux_chunk_slab, GFP_KERNEL); >>> + if (!chunk) { >>> + DRM_ERROR("alloc amdgpu_mux_chunk_slab failed\n"); >>> + return; >>> + } >>> + >>> + chunk->start = ring->wptr; >>> + list_add_tail(&chunk->entry, &e->list); } >>> + >>> +static void scan_and_remove_signaled_chunk(struct amdgpu_ring_mux >>> +*mux, struct amdgpu_ring *ring) { >>> + uint32_t last_seq, size = 0; >>> + struct amdgpu_mux_entry *e; >>> + struct amdgpu_mux_chunk *chunk, *tmp; >>> + >>> + e = amdgpu_get_sw_entry(mux, ring); >>> + if (!e) { >>> + DRM_ERROR("cannot find entry!\n"); >>> + return; >>> + } >>> + >>> + last_seq = atomic_read(&ring->fence_drv.last_seq); >>> + >>> + list_for_each_entry_safe(chunk, tmp, &e->list, entry) { >>> + if (chunk->sync_seq <= last_seq) { >>> + list_del(&chunk->entry); >>> + kmem_cache_free(amdgpu_mux_chunk_slab, chunk); >>> + } else { >>> + size++; >>> + } >>> + } >>> +} >>> + >>> +void amdgpu_ring_mux_end_ib(struct amdgpu_ring_mux *mux, struct >>> +amdgpu_ring *ring) { >>> + struct amdgpu_mux_entry *e; >>> + struct amdgpu_mux_chunk *chunk; >>> + >>> + e = amdgpu_get_sw_entry(mux, ring); >>> + if (!e) { >>> + DRM_ERROR("cannot find entry!\n"); >>> + return; >>> + } >>> + >>> + chunk = list_last_entry(&e->list, struct amdgpu_mux_chunk, entry); >>> + if (!chunk) { >>> + DRM_ERROR("cannot find chunk!\n"); >>> + return; >>> + } >>> + >>> + chunk->end = ring->wptr; >>> + chunk->sync_seq = READ_ONCE(ring->fence_drv.sync_seq); >>> + >>> + scan_and_remove_signaled_chunk(mux, ring); } >>> + >>> +static void amdgpu_mux_resubmit_chunks(struct amdgpu_ring_mux *mux, >>> +bool is_fallback) { >>> + struct amdgpu_mux_entry *e = NULL; >>> + struct amdgpu_mux_chunk *chunk; >>> + uint32_t seq, last_seq; >>> + int i; >>> + >>> + if (is_fallback) { >>> + if (!spin_trylock(&mux->lock)) { >>> + amdgpu_ring_mux_schedule_resubmit(mux); >>> + DRM_ERROR("reschedule resubmit\n"); >>> + return; >>> + } >>> + } else { >>> + spin_lock(&mux->lock); >>> + } >>> + >>> + /*find low priority entries:*/ >>> + if (!mux->s_resubmit) { >>> + spin_unlock(&mux->lock); >>> + return; >>> + } >>> + >>> + for (i = 0; i < mux->num_ring_entries; i++) { >>> + if (mux->ring_entry[i].ring->hw_prio <= AMDGPU_RING_PRIO_DEFAULT) { >>> + e = &mux->ring_entry[i]; >>> + break; >>> + } >>> + } >>> + >>> + if (!e) { >>> + DRM_ERROR("%s no low priority ring found\n", __func__); >>> + spin_unlock(&mux->lock); >>> + return; >>> + } >>> + >>> + last_seq = atomic_read(&e->ring->fence_drv.last_seq); >>> + seq = mux->seq_no_resubmit; >>> + if (last_seq < seq) { >>> + /*resubmit all the fences between (last_seq, seq]*/ >>> + list_for_each_entry(chunk, &e->list, entry) { >>> + if (chunk->sync_seq > last_seq && chunk->sync_seq <= seq) { >>> + copy_pkt_from_sw_ring(mux, e->ring, chunk->start, chunk->end); >>> + mux->wptr_resubmit = chunk->end; >>> + amdgpu_ring_commit(mux->real_ring); >>> + } >>> + } >>> + } >>> + >>> + del_timer(&mux->resubmit_timer); >>> + mux->s_resubmit = false; >>> + spin_unlock(&mux->lock); >>> +} >>> + >>> +static void amdgpu_mux_resubmit_fallback(struct timer_list *t) { >>> + struct amdgpu_ring_mux *mux = from_timer(mux, t, >>> +resubmit_timer); >>> + >>> + DRM_INFO("calling %s\n", __func__); >>> + amdgpu_mux_resubmit_chunks(mux, true); } >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h >>> index e8ee34e6b9a5..f6fc0afa3cc7 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h >>> @@ -35,6 +35,7 @@ struct amdgpu_ring; >>> * sw_cptr -- the position of the copy pointer in the sw ring >>> * sw_rptr -- the read pointer in software ring >>> * sw_wptr -- the write pointer in software ring >>> + * list -- list head for amdgpu_mux_chunk >>> */ >>> struct amdgpu_mux_entry { >>> struct amdgpu_ring *ring; >>> @@ -43,6 +44,7 @@ struct amdgpu_mux_entry { >>> u64 sw_cptr; >>> u64 sw_rptr; >>> u64 sw_wptr; >>> + struct list_head list; >>> }; >>> >>> struct amdgpu_ring_mux { >>> @@ -53,6 +55,24 @@ struct amdgpu_ring_mux { >>> unsigned int ring_entry_size; >>> /*the lock for copy data from different software rings*/ >>> spinlock_t lock; >>> + bool s_resubmit; >>> + uint32_t seq_no_resubmit; >>> + u64 wptr_resubmit; >>> + struct timer_list resubmit_timer; >>> +}; >>> + >>> +/* >>> + * amdgpu_munx_chunk -- save the location of indirect buffer's >>> +package on softare rings >>> + * entry -- the list entry. >>> + * sync_seq -- the fence seqno related with the saved IB. >>> + * start -- start location on the software ring. >>> + * end -- end location on the software ring. >>> + */ >>> +struct amdgpu_mux_chunk { >>> + struct list_head entry; >>> + uint32_t sync_seq; >>> + u64 start; >>> + u64 end; >>> }; >>> >>> int amdgpu_ring_mux_init(struct amdgpu_ring_mux *mux, struct >>> amdgpu_ring *ring, @@ -63,4 +83,8 @@ void amdgpu_ring_set_wptr_to_mux(struct amdgpu_ring_mux *mux, struct amdgpu_ring >>> u64 amdgpu_ring_get_wptr_from_mux(struct amdgpu_ring_mux *mux, struct amdgpu_ring *ring); >>> u64 amdgpu_ring_get_rptr_from_mux(struct amdgpu_ring_mux *mux, >>> struct amdgpu_ring *ring); >>> >>> +void amdgpu_ring_mux_start_ib(struct amdgpu_ring_mux *mux, struct >>> +amdgpu_ring *ring); void amdgpu_ring_mux_end_ib(struct >>> +amdgpu_ring_mux *mux, struct amdgpu_ring *ring); void >>> +amdgpu_ring_mux_schedule_resubmit(struct amdgpu_ring_mux *mux); >>> + >>> #endif >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c >>> index ec50793aa54d..4809ecf76180 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c >>> @@ -26,6 +26,7 @@ >>> >>> #include "amdgpu_sw_ring.h" >>> #include "amdgpu_ring_mux.h" >>> +#include "amdgpu_mcbp.h" >>> >>> u64 amdgpu_sw_ring_get_rptr_gfx(struct amdgpu_ring *ring) >>> { >>> @@ -58,3 +59,29 @@ void amdgpu_sw_ring_commit(struct amdgpu_ring *ring) >>> WARN_ON(!ring->is_sw_ring); >>> amdgpu_ring_set_wptr_to_mux(mux, ring, ring->wptr); >>> } >>> + >>> +void amdgpu_sw_ring_ib_begin(struct amdgpu_ring *ring) { >>> + struct amdgpu_device *adev = ring->adev; >>> + struct amdgpu_ring_mux *mux = &adev->gfx.muxer; >>> + >>> + WARN_ON(!ring->is_sw_ring); >>> + if (ring->hw_prio > AMDGPU_RING_PRIO_DEFAULT) { >>> + if (amdgpu_mcbp_scan(mux) > 0) >>> + amdgpu_mcbp_trigger_preempt(mux); >>> + return; >>> + } >>> + >>> + amdgpu_ring_mux_start_ib(mux, ring); } >>> + >>> +void amdgpu_sw_ring_ib_end(struct amdgpu_ring *ring) { >>> + struct amdgpu_device *adev = ring->adev; >>> + struct amdgpu_ring_mux *mux = &adev->gfx.muxer; >>> + >>> + WARN_ON(!ring->is_sw_ring); >>> + if (ring->hw_prio > AMDGPU_RING_PRIO_DEFAULT) >>> + return; >>> + amdgpu_ring_mux_end_ib(mux, ring); } >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>> index 9596c22fded6..b7e94553f4fb 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>> @@ -601,6 +601,7 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job, >>> if (!vm_flush_needed && !gds_switch_needed && !need_pipe_sync) >>> return 0; >>> >>> + amdgpu_ring_ib_begin(ring); >>> if (ring->funcs->init_cond_exec) >>> patch_offset = amdgpu_ring_init_cond_exec(ring); >>> >>> @@ -661,6 +662,7 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job, >>> amdgpu_ring_emit_switch_buffer(ring); >>> amdgpu_ring_emit_switch_buffer(ring); >>> } >>> + amdgpu_ring_ib_end(ring); >>> return 0; >>> } >>>