Re: [PATCH 09/21] drm/i915/gem: Disallow creating contexts with too many engines

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 29/04/2021 20:16, Jason Ekstrand wrote:
On Thu, Apr 29, 2021 at 3:01 AM Tvrtko Ursulin
<tvrtko.ursulin@xxxxxxxxxxxxxxx> wrote:
On 28/04/2021 18:09, Jason Ekstrand wrote:
On Wed, Apr 28, 2021 at 9:26 AM Tvrtko Ursulin
<tvrtko.ursulin@xxxxxxxxxxxxxxx> wrote:
On 28/04/2021 15:02, Daniel Vetter wrote:
On Wed, Apr 28, 2021 at 11:42:31AM +0100, Tvrtko Ursulin wrote:

On 28/04/2021 11:16, Daniel Vetter wrote:
On Fri, Apr 23, 2021 at 05:31:19PM -0500, Jason Ekstrand wrote:
There's no sense in allowing userspace to create more engines than it
can possibly access via execbuf.

Signed-off-by: Jason Ekstrand <jason@xxxxxxxxxxxxxx>
---
     drivers/gpu/drm/i915/gem/i915_gem_context.c | 7 +++----
     1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 5f8d0faf783aa..ecb3bf5369857 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1640,11 +1640,10 @@ set_engines(struct i915_gem_context *ctx,
                     return -EINVAL;
             }
-  /*
-   * Note that I915_EXEC_RING_MASK limits execbuf to only using the
-   * first 64 engines defined here.
-   */
             num_engines = (args->size - sizeof(*user)) / sizeof(*user->engines);

Maybe add a comment like /* RING_MASK has not shift, so can be used
directly here */ since I had to check that :-)

Same story about igt testcases needed, just to be sure.

Reviewed-by: Daniel Vetter <daniel.vetter@xxxxxxxx>

I am not sure about the churn vs benefit ratio here. There are also patches
which extend the engine selection field in execbuf2 over the unused
constants bits (with an explicit flag). So churn upstream and churn in
internal (if interesting) for not much benefit.

This isn't churn.

This is "lock done uapi properly".

Pretty much.

Still haven't heard what concrete problems it solves.

IMO it is a "meh" patch. Doesn't fix any problems and will create work
for other people and man hours spent which no one will ever properly
account against.

Number of contexts in the engine map should not really be tied to
execbuf2. As is demonstrated by the incoming work to address more than
63 engines, either as an extension to execbuf2 or future execbuf3.

Which userspace driver has requested more than 64 engines in a single context?

No need to artificially limit hardware capabilities in the uapi by
implementing a policy in the kernel. Which will need to be
removed/changed shortly anyway. This particular patch is work and
creates more work (which other people who will get to fix the fallout
will spend man hours to figure out what and why broke) for no benefit.
Or you are yet to explain what the benefit is in concrete terms.

You keep complaining about how much work it takes and yet I've spent
more time replying to your e-mails on this patch than I spent writing
the patch and the IGT test.  Also, if it takes so much time to add a
restriction, then why are we spending time figuring out how to modify
the uAPI to allow you to execbuf on a context with more than 64
engines?  If we're worried about engineering man-hours, then limiting
to 64 IS the pragmatic solution.

a)

Question of what problem does the patch fix is still unanswered.

b)

You miss the point. I'll continue in the next paragraph..


Why don't you limit it to number of physical engines then? Why don't you
filter out duplicates? Why not limit the number of buffer objects per
client or global based on available RAM + swap relative to minimum
object size? Reductio ad absurdum yes, but illustrating the, in this
case, a thin line between "locking down uapi" and adding too much policy
where it is not appropriate.

All this patch does is say that  you're not allowed to create a
context with more engines than the execbuf API will let you use.  We
already have an artificial limit.  All this does is push the error
handling further up the stack.  If someone comes up with a mechanism
to execbuf on engine 65 (they'd better have an open-source user if it
involves changing API), I'm very happy for them to bump this limit at
the same time.  It'll take them 5 minutes and it'll be something they
find while writing the IGT test.

.. no it won't take five minutes.

If I need to spell everything out - you will put this patch in, which fixes nothing, and it will propagate to the internal kernel at some point. Then a bunch of tests will start failing in a strange manner. Which will result in people triaging them, then assigning them, then reserving machines, setting them up, running the repro, then digging into the code, and eventually figuring out what happened.

It will take hours not five minutes. And there will likely be multiple bug reports which most likely won't be joined so mutliple people will be doing multi hour debug. All for nothing. So it is rather uninteresting how small the change is. Interesting part is how much pointless effort it will create across the organisation.

Of course you may not care that much about that side of things, or you are just not familiar in how it works in practice since you haven't been involved in the past years. I don't know really, but I have to raise the point it makes no sense to do this. Cost vs benefit is simply not nearly there.

Also, for execbuf3, I'd like to get rid of contexts entirely and have
engines be their own userspace-visible object.  If we go this
direction, you can have UINT32_MAX of them.  Problem solved.

Not the problem I am pointing at though.

You listed two ways that accessing engine 65 can happen: Extending
execbuf2 and adding a new execbuf3.  When/if execbuf3 happens, as I
pointed out above, it'll hopefully be a non-issue.  If someone extends
execbuf2 to support more than 64 engines and does not have a userspace
customer that wants said new API change, I will NAK the patch.  If
you've got a 3rd way that someone can get at engine 65 such that this
is a problem, I'd love to hear about it.

It's ever so easy to take a black and white stance but the world is more like shades of grey. I too am totally perplexed why we have to spend time arguing on a inconsequential patch.

Context create is not called "create execbuf2 context" so why be so wedded to adding execbuf2 restrictions into it I have no idea. If you were fixing some vulnerability or something I'd understand but all I've heard so far is along the lines of "This is proper locking down of uapi - end of". And endless waste of time discussion follows. We don't have to agree on everything anyway and I have raised my concern enough times now. Up to you guys to re-figure out the cost benefit on your own then.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx



[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux