On 05/16/2018 09:16 AM, Lucas Stach wrote: > Am Mittwoch, den 16.05.2018, 15:10 +0200 schrieb Christian König: >> Am 16.05.2018 um 15:00 schrieb Lucas Stach: >>> Am Mittwoch, den 16.05.2018, 14:32 +0200 schrieb Christian König: >>>> Am 16.05.2018 um 14:28 schrieb Lucas Stach: >>>>> Am Mittwoch, den 16.05.2018, 14:08 +0200 schrieb Christian König: >>>>>> Yes, exactly. >>>>>> >>>>>> For normal user space command submission we should have tons of >>>>>> locks >>>>>> guaranteeing that (e.g. just the VM lock should do). >>>>>> >>>>>> For kernel moves we have the mutex for the GTT windows which >>>>>> protects >>>>>> it. >>>>>> >>>>>> The could be problems with the UVD/VCE queues to cleanup the >>>>>> handles >>>>>> when an application crashes. >>>>> FWIW, etnaviv is currently completely unlocked in this path, but I >>>>> believe that this isn't an issue as the sched fence seq numbers are >>>>> per >>>>> entity. So to actually end up with reversed seqnos one context has >>>>> to >>>>> preempt itself to do another submit, while the current one hasn't >>>>> returned from kernel space, which I believe is a fairly theoretical >>>>> issue. Is my understanding correct? >>>> Yes. The problem is with the right timing this can be used to access >>>> freed up memory. >>>> >>>> If you then manage to place a page table in that freed up memory >>>> taking >>>> over the system is just a typing exercise. >>> Thanks. I believe we don't have this problem in etnaviv, as memory >>> referencing is tied to the job and will only be unreferenced on >>> free_job, but I'll re-check this when I've got some time. >> Well that depends on how you use the sequence numbers. >> >> If you use them for job completion check somewhere then you certainly >> have a problem if job A gets the 1 and B the 2, but B completes before A. > We don't do this anymore. All the etnaviv internals are driven by the > fence signal callbacks. > >> At bare minimum that's still a bug we need to fix because it confuses >> functions like dma_fence_is_later() and dma_fence_later(). > Agreed. Also amending the documentation to state that > drm_sched_job_init() and drm_sched_entity_push_job() need to be called > under a common lock seems like a good idea. I will respin the patch with added documentation. Andrey > Regards, > Lucas