On Fri, 2016-07-01 at 19:34 +0100, Chris Wilson wrote: > On Fri, Jul 01, 2016 at 05:58:18PM +0100, Dave Gordon wrote: > > On 30/06/16 13:49, Tvrtko Ursulin wrote: > > > > > > On 30/06/16 11:22, Chris Wilson wrote: > > > > On Thu, Jun 30, 2016 at 09:50:20AM +0100, Tvrtko Ursulin wrote: > > > > > > > > > > On 30/06/16 02:35, Hong Liu wrote: > > > > > > Return the allocated request pointer directly to remove > > > > > > the double pointer parameter. > > > > > > > > > > > > Signed-off-by: Hong Liu <hong.liu@xxxxxxxxx> > > > > > > --- > > > > > > drivers/gpu/drm/i915/i915_gem.c | 25 +++++++------------ > > > > > > ------ > > > > > > 1 file changed, 7 insertions(+), 18 deletions(-) > > > > > > > > > > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c > > > > > > b/drivers/gpu/drm/i915/i915_gem.c > > > > > > index 1d98782..9881455 100644 > > > > > > --- a/drivers/gpu/drm/i915/i915_gem.c > > > > > > +++ b/drivers/gpu/drm/i915/i915_gem.c > > > > > > @@ -2988,32 +2988,26 @@ void i915_gem_request_free(struct > > > > > > kref > > > > > > *req_ref) > > > > > > kmem_cache_free(req->i915->requests, req); > > > > > > } > > > > > > > > > > > > -static inline int > > > > > > +static inline struct drm_i915_gem_request * > > > > > > __i915_gem_request_alloc(struct intel_engine_cs *engine, > > > > > > - struct i915_gem_context *ctx, > > > > > > - struct drm_i915_gem_request **req_out) > > > > > > + struct i915_gem_context *ctx) > > > > > > { > > > > > > struct drm_i915_private *dev_priv = engine->i915; > > > > > > unsigned reset_counter = > > > > > > i915_reset_counter(&dev_priv->gpu_error); > > > > > > struct drm_i915_gem_request *req; > > > > > > int ret; > > > > > > > > > > > > - if (!req_out) > > > > > > - return -EINVAL; > > > > > > - > > > > > > - *req_out = NULL; > > > > > > - > > > > > > /* ABI: Before userspace accesses the GPU (e.g. > > > > > > execbuffer), > > > > > > report > > > > > > * EIO if the GPU is already wedged, or EAGAIN to drop > > > > > > the > > > > > > struct_mutex > > > > > > * and restart. > > > > > > */ > > > > > > ret = i915_gem_check_wedge(reset_counter, > > > > > > dev_priv->mm.interruptible); > > > > > > if (ret) > > > > > > - return ret; > > > > > > + return ERR_PTR(ret); > > > > > > > > > > > > req = kmem_cache_zalloc(dev_priv->requests, > > > > > > GFP_KERNEL); > > > > > > if (req == NULL) > > > > > > - return -ENOMEM; > > > > > > + return ERR_PTR(-ENOMEM); > > > > > > > > > > > > ret = i915_gem_get_seqno(engine->i915, &req->seqno); > > > > > > if (ret) > > > > > > @@ -3041,14 +3035,13 @@ __i915_gem_request_alloc(struct > > > > > > intel_engine_cs *engine, > > > > > > if (ret) > > > > > > goto err_ctx; > > > > > > > > > > > > - *req_out = req; > > > > > > - return 0; > > > > > > + return req; > > > > > > > > > > > > err_ctx: > > > > > > i915_gem_context_unreference(ctx); > > > > > > err: > > > > > > kmem_cache_free(dev_priv->requests, req); > > > > > > - return ret; > > > > > > + return ERR_PTR(ret); > > > > > > } > > > > > > > > > > > > /** > > > > > > @@ -3067,13 +3060,9 @@ struct drm_i915_gem_request * > > > > > > i915_gem_request_alloc(struct intel_engine_cs *engine, > > > > > > struct i915_gem_context *ctx) > > > > > > { > > > > > > - struct drm_i915_gem_request *req; > > > > > > - int err; > > > > > > - > > > > > > if (ctx == NULL) > > > > > > ctx = engine->i915->kernel_context; > > > > > > - err = __i915_gem_request_alloc(engine, ctx, &req); > > > > > > - return err ? ERR_PTR(err) : req; > > > > > > + return __i915_gem_request_alloc(engine, ctx); > > > > > > } > > > > > > > > > > > > struct drm_i915_gem_request * > > > > > > > > > > > > > > > > Looks good to me. And have this feeling I've seen this > > > > > somewhere before. > > > > > > > > Several times. This is not the full tidy, nor does it realise > > > > the > > > > ramifactions of request alloc through the stack. > > > > > > Hm I can't spot that it is doing anything wrong or making > > > anything > > > worse. You don't want to let the small cleanup in? > > > > > > Regards, > > > Tvrtko > > > > It ought to make almost no difference, because the *only* place the > > inner function is called is from the outer one, which passes a > > pointer to a local for the returned object; and the inner one is > > then inlined, so the compiler doesn't actually put it on the stack > > and call to the inner allocator anyway. > > > > Strangely, however, with this change the code becomes ~400 bytes > > bigger! > > > > Disassembly reveals that while the code for the externally-callable > > outer function is indeed almost identical, a second copy of it has > > also been inlined at the one callsite in this file: > > > > __i915_gem_object_sync() ... > > req = i915_gem_request_alloc(to, NULL); > > > > I don't think that's a critical path and would rather have 400 > > bytes > > smaller codespace. We can get that back by adding /noinline/ to the > > outer function i915_gem_request_alloc() (not, of course, to the > > inner one, that definitely *should* be inline). > > __i915_gem_object_sync() should not be calling > i915_gem_request_alloc(). > > That's the issue with this patch, your patch and John's patch. So we wrote the i915_gem_request_alloc() this way is to avoid being inlined into callers like __i915_gem_object_sync()? I checked the file with GCC 4.8.5 on my centos environment, it is like what Dave found. With the patch, i915_gem_object_sync() is 368 bytes bigger. But when I checked it with GCC 6.1.1 on Fedora 24, it seems it inlines the i915_gem_request_alloc() even with the current implementation. With the patch, the i915_gem_object_sync() is 80 bytes smaller. > -Chris > _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx