Am 18.05.2018 um 17:02 schrieb Andrey Grodzovsky:
On 05/18/2018 10:50 AM, Christian König wrote:
Am 18.05.2018 um 16:44 schrieb Michel Dänzer:
On 2018-05-18 11:42 AM, Christian König wrote:
Anyway, the kernel can't rely on userspace using O_CLOEXEC. If the
flush
callback being called from multiple processes is an issue, maybe the
flush callback isn't appropriate after all.
Userspace could also grab a reference just by opening /proc/$pid/fd/*.
The idea is just that when any process which used the fd is killed
by a
signal we drop the remaining jobs from being submitted to the
hardware.
This must only affect jobs submitted by the killed process, not those
submitted by other processes.
Yeah, that's exactly the plan here.
I don't see how it's gong to happen -
.flush is being called for any terminating process regardless if he
submitted jobs
or just accidentally (or not) has the device file FD in his private
file table. So here
we going to have a problem with that requirement. If a process is
being killed and .flush is
executed I don't have any way to know which amdgpu_ctx to chose to
terminate it's pending jobs.
The only info i have from .flush caller is the process id.
As it's now in amdgpu_ctx_mgr_entity_fini and
amdgpu_ctx_mgr_entity_cleanup we are going to iterate
all the contextes from the context manager list and terminate them
all, which sounds wrong to me indeed.
I can save the pid of the context creator on the context structure so
i can match during .flush call, but in case some one
creates the context but passes the context id to another process for
actual job submission this approach won't work either.
Am I messing something here ?
Your analyses is correct, it's just that I think that this case should
not happen.
What can happen is that the fd is passed accidentally to child processes
and those child processes are then killed, but passing the fd to child
processes is a bug in the first place.
When somebody on purpose opens the fd and kills the process then it
breaks and he can keep the pieces. I mean to open the fd you need to be
privileged anyway.
What we could do to completely fix the issue:
1. Note for each submitted job which process (pid) it submitted.
2. During flush wait or kill only jobs of the current process.
But I think that this is overkill.
Christian.
Andrey
For additional security we could safe the pid of the job submitter,
but since this should basically not happen in normal operation I
would rather like to avoid that.
Christian.
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel