Hi Christian Thanks, I got it. I will send another patch for the KIQ overrun problem Best Regards Yintian Tao -----Original Message----- From: Koenig, Christian <Christian.Koenig@xxxxxxx> Sent: 2020年4月22日 20:33 To: Tao, Yintian <Yintian.Tao@xxxxxxx>; Liu, Monk <Monk.Liu@xxxxxxx>; Kuehling, Felix <Felix.Kuehling@xxxxxxx> Cc: amd-gfx@xxxxxxxxxxxxxxxxxxxxx Subject: Re: [PATCH] drm/amdgpu: refine kiq access register Am 22.04.20 um 14:20 schrieb Tao, Yintian: > Hi Christian > > > Please see inline commetns. > -----Original Message----- > From: Koenig, Christian <Christian.Koenig@xxxxxxx> > Sent: 2020年4月22日 19:57 > To: Tao, Yintian <Yintian.Tao@xxxxxxx>; Liu, Monk <Monk.Liu@xxxxxxx>; > Kuehling, Felix <Felix.Kuehling@xxxxxxx> > Cc: amd-gfx@xxxxxxxxxxxxxxxxxxxxx > Subject: Re: [PATCH] drm/amdgpu: refine kiq access register > > Am 22.04.20 um 13:49 schrieb Tao, Yintian: >> Hi Christian >> >> >> Can you help answer the questions below? Thanks in advance. >> -----Original Message----- >> From: Koenig, Christian <Christian.Koenig@xxxxxxx> >> Sent: 2020年4月22日 19:03 >> To: Tao, Yintian <Yintian.Tao@xxxxxxx>; Liu, Monk <Monk.Liu@xxxxxxx>; >> Kuehling, Felix <Felix.Kuehling@xxxxxxx> >> Cc: amd-gfx@xxxxxxxxxxxxxxxxxxxxx >> Subject: Re: [PATCH] drm/amdgpu: refine kiq access register >> >> Am 22.04.20 um 11:29 schrieb Yintian Tao: >>> According to the current kiq access register method, there will be >>> race condition when using KIQ to read register if multiple clients >>> want to read at same time just like the expample below: >>> 1. client-A start to read REG-0 throguh KIQ 2. client-A poll the >>> seqno-0 3. client-B start to read REG-1 through KIQ 4. client-B poll >>> the seqno-1 5. the kiq complete these two read operation 6. client-A >>> to read the register at the wb buffer and >>> get REG-1 value >>> >>> And if there are multiple clients to frequently write registers >>> through KIQ which may raise the KIQ ring buffer overwritten problem. >>> >>> Therefore, allocate fixed number wb slot for rreg use and limit the >>> submit number which depends on the kiq ring_size in order to prevent >>> the overwritten problem. >>> >>> v2: directly use amdgpu_device_wb_get() for each read instead >>> of to reserve fixde number slot. >>> if there is no enough kiq ring buffer or rreg slot then >>> directly print error log and return instead of busy waiting >> I would split that into three patches. One for each problem we have here: >> >> 1. Fix kgd_hiq_mqd_load() and maybe other occasions to use spin_lock_irqsave(). >> [yttao]: Do you mean that we need to use spin_lock_irqsave for the functions just like kgd_hiq_mqd_load()? > Yes, I strongly think so. > > See when you have one spin lock you either need always need to lock it with irqs disabled or never. > > In other words we always need to either use spin_lock() or spin_lock_irqsave(), but never mix them with the same lock. > > The only exception to this rule is when you take multiple locks, e.g. > you can do: > > spin_lock_irqsave(&a, flags); > spin_lock(&b, flags); > spin_lock(&c, flags); > .... > spin_unlock_irqsave(&a, flags); > > Here you don't need to use spin_lock_irqsave for b and c. But we rarely have that case in the code. > [yttao]: thanks , I got it. I will submit another patch for it. > >> 2. Prevent the overrung of the KIQ. Please drop the approach with the >> atomic here. Instead just add a amdgpu_fence_wait_polling() into >> amdgpu_fence_emit_polling() as I discussed with Monk. >> [yttao]: Sorry, I can't get your original idea for the amdgpu_fence_wait_polling(). Can you give more details about it? Thanks in advance. >> >> "That is actually only a problem because the KIQ uses polling waits. >> >> See amdgpu_fence_emit() waits for the oldest possible fence to be signaled before emitting a new one. >> >> I suggest that we do the same in amdgpu_fence_emit_polling(). A one liner like the following should be enough: >> >> amdgpu_fence_wait_polling(ring, seq - ring->fence_drv.num_fences_mask, timeout);" >> [yttao]: there is no usage of num_fences_mask at kiq fence polling, the num_fences_mask is only effective at dma_fence architecture. >> If I understand correctly, do you want the protype code below? If the protype code is wrong, can you help give one sample? Thanks in advance. >> >> int amdgpu_fence_emit_polling(struct amdgpu_ring *ring, uint32_t *s) { >> uint32_t seq; >> >> if (!s) >> return -EINVAL; >> + amdgpu_fence_wait_polling(ring, seq, timeout); >> seq = ++ring->fence_drv.sync_seq; > Your understanding sounds more or less correct. The code should look something like this: > > seq = ++ring->fence_drv.sync_seq; > amdgpu_fence_wait_polling(ring, seq - > number_of_allowed_submissions_to_the_kiq, timeout); > [yttao]: whether we need directly wait at the first just like below? Otherwise, amdgpu_ring_emit_wreg may overwrite the KIQ ring buffer. There should always be room for at least one more submission. As long as we always submit a fence checking the free room there should be fine. Regards, Christian. > + amdgpu_fence_wait_polling(ring, seq - > +number_of_allowed_submissions_to_the_kiq, timeout); > spin_lock_irqsave(&kiq->ring_lock, flags); > amdgpu_ring_alloc(ring, 32); > amdgpu_ring_emit_wreg(ring, reg, v); > amdgpu_fence_emit_polling(ring, &seq); /* wait */ > amdgpu_ring_commit(ring); > spin_unlock_irqrestore(&kiq->ring_lock, flags); > > I just used num_fences_mask as > number_of_allowed_submissions_to_the_kiq > because it is probably a good value to start with. > > But you could give that as parameter as well if you think that makes more sense. > >> amdgpu_ring_emit_fence(ring, ring->fence_drv.gpu_addr, >> ¦ seq, 0); >> >> *s = seq; >> >> return 0; >> } >> >> >> >> >> 3. Use amdgpu_device_wb_get() each time we need to submit a read. >> [yttao]: yes, I will do it. > Thanks, > Christian. > >> Regards, >> Christian. >> _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx