On Tue, 2015-07-07 at 09:46 +0100, Chris Wilson wrote: > On Tue, Jul 07, 2015 at 01:12:11PM +0530, Ankitprasad Sharma wrote: > > On Thu, 2015-07-02 at 10:50 +0100, Chris Wilson wrote: > > > On Thu, Jul 02, 2015 at 10:30:43AM +0100, Tvrtko Ursulin wrote: > > > > Well.. I the meantime why duplicate it when > > > > i915_gem_validate_context does i915_gem_context_get and deferred > > > > create if needed. I don't see the downside. Snippet from above > > > > becomes: > > > > > > > > ring = &dev_priv->ring[HAS_BLT(dev) ? BCS : RCS]; > > > > ctx = i915_gem_validate_context(dev, file, ring, > > > > DFAULT_CONTEXT_HANDLE); > > > > ... handle error... > > > > /* Allocate a request for this operation nice and early. */ > > > > ret = i915_gem_request_alloc(ring, ctx, &req); > > > > > > > > Why would this code have to know about deferred create. > > > > > > This one is irrelevant. Since the default_context is always allocated > > > and available via the ring, we don't need to pretend we are inside > > > userspace and do the idr lookup and validation, we can just use it > > > directly. > > > > > > > >>Why is this needed? > > > > > > > > > >Because it's a requirement of the op being used on those gen. Later gen > > > > >can keep the fence. > > > > > > > > > >>Could it be called unconditionally and still work? > > > > >> > > > > >>At least I would recommend a comment explaining it. > > > > > > > > It is ugly to sprinkle platform knowledge to the callers - I think I > > > > saw a callsites which call i915_gem_object_put_fence unconditionally > > > > so why would that not work here? > > > > > > That's access via the GTT though. This is access via the GPU using GPU > > > instructions, which sometimes use fences and sometimes don't. That > > > knowledge is already baked into the choice of command. > > > > > > What I would actually support would be to just use CPU GTT clearing. > > But, we have verified earlier here that for large objects GPU clearing > > is faster (here hoping that stolen region will be used mainly for > > framebuffers). Is it ok to do this conditionally based on the size of > > the objects? and GPU clearing is asynchronous. > > Hmm, this will be the GTT overhead which we can't avoid since we are > banned from accessing stolen directly (on byt). > > Honestly this is the ugliest patch in the series. If we can do without > it it would make accepting it much easier. And then you have a nice > performance argument for doing via the blitter. Though I would be > surprised if the userspace cache was doing such a bad job that frequent > reallocations from stolen were required. > > > > That will run at memory speeds, only stall for a small fraction of the > > > second, and later if the workloads demand it, we can do GPU clearing. > > Also how do you suggest to bring the workload in as a deciding factor? > > may be checking the current running frequency or based on the number of > > pending requests? > > Or are you suggesting to use CPU GTT clearing completely? > > > > > > It's much simpler, and I would say for any real workload the stolen > > > objects will be cached to avoid reallocations. > > Yes. A quick patch to do an unconditional memset() of a pinned GTT > stolen object should be about 20 lines. For the benefit of getting > create2 upsteam with little fuss, I think that is worth it. So this ugly patch itself will go away :( and I will add a new function in i915_gem_stolen.c to do CPU (GTT) based clearing of the object in stolen. > _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx