On 2023-03-28 04:54, Lucas Stach wrote: > Hi Danilo, > > Am Dienstag, dem 28.03.2023 um 02:57 +0200 schrieb Danilo Krummrich: >> Hi all, >> >> Commit df622729ddbf ("drm/scheduler: track GPU active time per entity") >> tries to track the accumulated time that a job was active on the GPU >> writing it to the entity through which the job was deployed to the >> scheduler originally. This is done within drm_sched_get_cleanup_job() >> which fetches a job from the schedulers pending_list. >> >> Doing this can result in a race condition where the entity is already >> freed, but the entity's newly added elapsed_ns field is still accessed >> once the job is fetched from the pending_list. >> >> After drm_sched_entity_destroy() being called it should be safe to free >> the structure that embeds the entity. However, a job originally handed >> over to the scheduler by this entity might still reside in the >> schedulers pending_list for cleanup after drm_sched_entity_destroy() >> already being called and the entity being freed. Hence, we can run into >> a UAF. >> > Sorry about that, I clearly didn't properly consider this case. > >> In my case it happened that a job, as explained above, was just picked >> from the schedulers pending_list after the entity was freed due to the >> client application exiting. Meanwhile this freed up memory was already >> allocated for a subsequent client applications job structure again. >> Hence, the new jobs memory got corrupted. Luckily, I was able to >> reproduce the same corruption over and over again by just using >> deqp-runner to run a specific set of VK test cases in parallel. >> >> Fixing this issue doesn't seem to be very straightforward though (unless >> I miss something), which is why I'm writing this mail instead of sending >> a fix directly. >> >> Spontaneously, I see three options to fix it: >> >> 1. Rather than embedding the entity into driver specific structures >> (e.g. tied to file_priv) we could allocate the entity separately and >> reference count it, such that it's only freed up once all jobs that were >> deployed through this entity are fetched from the schedulers pending list. >> > My vote is on this or something in similar vain for the long term. I > have some hope to be able to add a GPU scheduling algorithm with a bit > more fairness than the current one sometime in the future, which > requires execution time tracking on the entities. Danilo, Using kref is preferable, i.e. option 1 above. Lucas, can you shed some light on, 1. In what way the current FIFO scheduling is unfair, and 2. shed some details on this "scheduling algorithm with a bit more fairness than the current one"? Regards, Luben > >> 2. Somehow make sure drm_sched_entity_destroy() does block until all >> jobs deployed through this entity were fetched from the schedulers >> pending list. Though, I'm pretty sure that this is not really desirable. >> >> 3. Just revert the change and let drivers implement tracking of GPU >> active times themselves. >> > Given that we are already pretty late in the release cycle and etnaviv > being the only driver so far making use of the scheduler elapsed time > tracking I think the right short term solution is to either move the > tracking into etnaviv or just revert the change for now. I'll have a > look at this. > > Regards, > Lucas > >> In the case of just reverting the change I'd propose to also set a jobs >> entity pointer to NULL once the job was taken from the entity, such >> that in case of a future issue we fail where the actual issue resides >> and to make it more obvious that the field shouldn't be used anymore >> after the job was taken from the entity. >> >> I'm happy to implement the solution we agree on. However, it might also >> make sense to revert the change until we have a solution in place. I'm >> also happy to send a revert with a proper description of the problem. >> Please let me know what you think. >> >> - Danilo >> >