On Tue, Dec 7, 2021 at 12:28 PM Lionel Landwerlin <lionel.g.landwerlin@xxxxxxxxx> wrote: > > On 07/12/2021 13:00, Christian König wrote: > > Am 07.12.21 um 11:40 schrieb Bas Nieuwenhuizen: > >> On Tue, Dec 7, 2021 at 8:21 AM Christian König > >> <christian.koenig@xxxxxxx> wrote: > >>> Am 07.12.21 um 08:10 schrieb Lionel Landwerlin: > >>>> On 07/12/2021 03:32, Bas Nieuwenhuizen wrote: > >>>>> See the comments in the code. Basically if the seqno is already > >>>>> signalled then we get a NULL fence. If we then put the NULL fence > >>>>> in a binary syncobj it counts as unsignalled, making that syncobj > >>>>> pretty much useless for all expected uses. > >>>>> > >>>>> Not 100% sure about the transfer to a timeline syncobj but I > >>>>> believe it is needed there too, as AFAICT the add_point function > >>>>> assumes the fence isn't NULL. > >>>>> > >>>>> Fixes: ea569910cbab ("drm/syncobj: add transition iotcls between > >>>>> binary and timeline v2") > >>>>> Cc: stable@xxxxxxxxxxxxxxx > >>>>> Signed-off-by: Bas Nieuwenhuizen <bas@xxxxxxxxxxxxxxxxxxx> > >>>>> --- > >>>>> drivers/gpu/drm/drm_syncobj.c | 26 ++++++++++++++++++++++++++ > >>>>> 1 file changed, 26 insertions(+) > >>>>> > >>>>> diff --git a/drivers/gpu/drm/drm_syncobj.c > >>>>> b/drivers/gpu/drm/drm_syncobj.c > >>>>> index fdd2ec87cdd1..eb28a40400d2 100644 > >>>>> --- a/drivers/gpu/drm/drm_syncobj.c > >>>>> +++ b/drivers/gpu/drm/drm_syncobj.c > >>>>> @@ -861,6 +861,19 @@ static int > >>>>> drm_syncobj_transfer_to_timeline(struct drm_file *file_private, > >>>>> &fence); > >>>>> if (ret) > >>>>> goto err; > >>>>> + > >>>>> + /* If the requested seqno is already signaled > >>>>> drm_syncobj_find_fence may > >>>>> + * return a NULL fence. To make sure the recipient gets > >>>>> signalled, use > >>>>> + * a new fence instead. > >>>>> + */ > >>>>> + if (!fence) { > >>>>> + fence = dma_fence_allocate_private_stub(); > >>>>> + if (!fence) { > >>>>> + ret = -ENOMEM; > >>>>> + goto err; > >>>>> + } > >>>>> + } > >>>>> + > >>>> > >>>> Shouldn't we fix drm_syncobj_find_fence() instead? > >>> Mhm, now that you mention it. Bas, why do you think that > >>> dma_fence_chain_find_seqno() may return NULL when the fence is already > >>> signaled? > >>> > >>> Double checking the code that should never ever happen. > >> Well, I tested the patch with > >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fmesa%2Fmesa%2F-%2Fmerge_requests%2F14097%2Fdiffs%3Fcommit_id%3Dd4c5c840f4e3839f9f5c1747a9034eb2b565f5c0&data=04%7C01%7Cchristian.koenig%40amd.com%7Cc1ab29fc100842826f5d08d9b96e102a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637744705383763833%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=sXkTJWm%2FWm2xwgLGdepVWAOlqj%2FeArnvmMvnJpQ9YEs%3D&reserved=0 > >> > >> so I'm pretty sure it happens, and this patch fixes it, though I may > >> have misidentified what the code should do. > >> > >> My reading is that the dma_fence_chain_for_each in > >> dma_fence_chain_find_seqno will never visit a signalled fence (unless > >> the top one is signalled), as dma_fence_chain_walk will never return a > >> signalled fence (it only returns on NULL or !signalled). > > > > Ah, yes that suddenly makes more sense. > > > >> Happy to move this to drm_syncobj_find_fence. > > > > No, I think that your current patch is fine. > > > > That drm_syncobj_find_fence() only returns NULL when it can't find > > anything !signaled is correct behavior I think. > > > We should probably update the docs then : > > > * Returns 0 on success or a negative error value on failure. On > success @fence > * contains a reference to the fence, which must be released by calling > * dma_fence_put(). > > > Looking at some of the kernel drivers, it looks like they don't all > protect themselves against NULL pointers : > > > https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/vc4/vc4_gem.c#L1195 > > https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c#L1020 amdgpu handles it here (amdgpu_sync_fence checks for a NULL fence). But yeah I think it is a bit treacherous, especially as this only occurs with timeline semaphores. > > > -Lionel > > > > > > Going to push your original patch if nobody has any more objections. > > > > But somebody might want to take care of the IGT as well. > > > > Regards, > > Christian. > > > >>> Regards, > >>> Christian. > >>> > >>>> By returning a stub fence for the timeline case if there isn't one. > >>>> > >>>> > >>>> Because the same NULL fence check appears missing in amdgpu (and > >>>> probably other drivers). > >>>> > >>>> > >>>> Also we should have tests for this in IGT. > >>>> > >>>> AMD contributed some tests when this code was written but they never > >>>> got reviewed :( > >>>> > >>>> > >>>> -Lionel > >>>> > >>>> > >>>>> chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL); > >>>>> if (!chain) { > >>>>> ret = -ENOMEM; > >>>>> @@ -890,6 +903,19 @@ drm_syncobj_transfer_to_binary(struct drm_file > >>>>> *file_private, > >>>>> args->src_point, args->flags, &fence); > >>>>> if (ret) > >>>>> goto err; > >>>>> + > >>>>> + /* If the requested seqno is already signaled > >>>>> drm_syncobj_find_fence may > >>>>> + * return a NULL fence. To make sure the recipient gets > >>>>> signalled, use > >>>>> + * a new fence instead. > >>>>> + */ > >>>>> + if (!fence) { > >>>>> + fence = dma_fence_allocate_private_stub(); > >>>>> + if (!fence) { > >>>>> + ret = -ENOMEM; > >>>>> + goto err; > >>>>> + } > >>>>> + } > >>>>> + > >>>>> drm_syncobj_replace_fence(binary_syncobj, fence); > >>>>> dma_fence_put(fence); > >>>>> err: > >>>> > > >