Am 07.12.21 um 12:35 schrieb Bas Nieuwenhuizen:
On Tue, Dec 7, 2021 at 12:28 PM Lionel Landwerlin
<lionel.g.landwerlin@xxxxxxxxx> wrote:
On 07/12/2021 13:00, Christian König wrote:
Am 07.12.21 um 11:40 schrieb Bas Nieuwenhuizen:
On Tue, Dec 7, 2021 at 8:21 AM Christian König
<christian.koenig@xxxxxxx> wrote:
Am 07.12.21 um 08:10 schrieb Lionel Landwerlin:
On 07/12/2021 03:32, Bas Nieuwenhuizen wrote:
See the comments in the code. Basically if the seqno is already
signalled then we get a NULL fence. If we then put the NULL fence
in a binary syncobj it counts as unsignalled, making that syncobj
pretty much useless for all expected uses.
Not 100% sure about the transfer to a timeline syncobj but I
believe it is needed there too, as AFAICT the add_point function
assumes the fence isn't NULL.
Fixes: ea569910cbab ("drm/syncobj: add transition iotcls between
binary and timeline v2")
Cc: stable@xxxxxxxxxxxxxxx
Signed-off-by: Bas Nieuwenhuizen <bas@xxxxxxxxxxxxxxxxxxx>
---
drivers/gpu/drm/drm_syncobj.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/drivers/gpu/drm/drm_syncobj.c
b/drivers/gpu/drm/drm_syncobj.c
index fdd2ec87cdd1..eb28a40400d2 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -861,6 +861,19 @@ static int
drm_syncobj_transfer_to_timeline(struct drm_file *file_private,
&fence);
if (ret)
goto err;
+
+ /* If the requested seqno is already signaled
drm_syncobj_find_fence may
+ * return a NULL fence. To make sure the recipient gets
signalled, use
+ * a new fence instead.
+ */
+ if (!fence) {
+ fence = dma_fence_allocate_private_stub();
+ if (!fence) {
+ ret = -ENOMEM;
+ goto err;
+ }
+ }
+
Shouldn't we fix drm_syncobj_find_fence() instead?
Mhm, now that you mention it. Bas, why do you think that
dma_fence_chain_find_seqno() may return NULL when the fence is already
signaled?
Double checking the code that should never ever happen.
Well, I tested the patch with
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fmesa%2Fmesa%2F-%2Fmerge_requests%2F14097%2Fdiffs%3Fcommit_id%3Dd4c5c840f4e3839f9f5c1747a9034eb2b565f5c0&data=04%7C01%7Cchristian.koenig%40amd.com%7C342b0cbdff7d487630ae08d9b975abd9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637744738372115823%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=5SiZYL1TgLq3ldGy1COOkSasklZWQN%2BxWGXJ1j%2BHSOQ%3D&reserved=0
so I'm pretty sure it happens, and this patch fixes it, though I may
have misidentified what the code should do.
My reading is that the dma_fence_chain_for_each in
dma_fence_chain_find_seqno will never visit a signalled fence (unless
the top one is signalled), as dma_fence_chain_walk will never return a
signalled fence (it only returns on NULL or !signalled).
Ah, yes that suddenly makes more sense.
Happy to move this to drm_syncobj_find_fence.
No, I think that your current patch is fine.
That drm_syncobj_find_fence() only returns NULL when it can't find
anything !signaled is correct behavior I think.
We should probably update the docs then :
* Returns 0 on success or a negative error value on failure. On
success @fence
* contains a reference to the fence, which must be released by calling
* dma_fence_put().
Looking at some of the kernel drivers, it looks like they don't all
protect themselves against NULL pointers :
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Flatest%2Fsource%2Fdrivers%2Fgpu%2Fdrm%2Fvc4%2Fvc4_gem.c%23L1195&data=04%7C01%7Cchristian.koenig%40amd.com%7C342b0cbdff7d487630ae08d9b975abd9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637744738372115823%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=LjyWQpIUqqAGgR7ak3CJOJTqf%2FG8QB9BZX542vL25RA%3D&reserved=0
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Flatest%2Fsource%2Fdrivers%2Fgpu%2Fdrm%2Famd%2Famdgpu%2Famdgpu_cs.c%23L1020&data=04%7C01%7Cchristian.koenig%40amd.com%7C342b0cbdff7d487630ae08d9b975abd9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637744738372115823%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=kk9k8sDNLiOFTIyul79FbhfQ4Y2MdwFA6rT0h46xM40%3D&reserved=0
amdgpu handles it here (amdgpu_sync_fence checks for a NULL fence).
But yeah I think it is a bit treacherous, especially as this only
occurs with timeline semaphores.
Mhm, that's a good point.
While I still think it makes more sense from the design perspective to
return NULL to distinct that there is nothing to wait for, I see as well
that this is it not the most defensive approach.
Ok in that case let's move it into drm_syncobj_find_fence() instead.
Just one more thing, the timestamp is now busted anyway (cause the
original fence is already garbage collected). So we can probably use
dma_fence_get_stub() instead of dma_fence_allocate_private_stub().
Regards,
Christian.
-Lionel
Going to push your original patch if nobody has any more objections.
But somebody might want to take care of the IGT as well.
Regards,
Christian.
Regards,
Christian.
By returning a stub fence for the timeline case if there isn't one.
Because the same NULL fence check appears missing in amdgpu (and
probably other drivers).
Also we should have tests for this in IGT.
AMD contributed some tests when this code was written but they never
got reviewed :(
-Lionel
chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
if (!chain) {
ret = -ENOMEM;
@@ -890,6 +903,19 @@ drm_syncobj_transfer_to_binary(struct drm_file
*file_private,
args->src_point, args->flags, &fence);
if (ret)
goto err;
+
+ /* If the requested seqno is already signaled
drm_syncobj_find_fence may
+ * return a NULL fence. To make sure the recipient gets
signalled, use
+ * a new fence instead.
+ */
+ if (!fence) {
+ fence = dma_fence_allocate_private_stub();
+ if (!fence) {
+ ret = -ENOMEM;
+ goto err;
+ }
+ }
+
drm_syncobj_replace_fence(binary_syncobj, fence);
dma_fence_put(fence);
err: