Hi,
On 11/1/21 15:50, Tvrtko Ursulin wrote:
On 01/11/2021 13:51, Thomas Hellström wrote:
Hi, Tvrtko
On Mon, 2021-11-01 at 13:14 +0000, Tvrtko Ursulin wrote:
On 01/11/2021 12:24, Thomas Hellström wrote:
As we start to introduce asynchronous failsafe object migration,
where we update the object state and then submit asynchronous
commands we need to record what memory resources are actually used
by various part of the command stream. Initially for three
purposes:
1) Error capture.
2) Asynchronous migration error recovery.
3) Asynchronous vma bind.
FWIW something like this may be interesting to me as well, although I
haven't looked much into details yet, for the purpose of allowing
delayed "put pages" via decoupling from the GEM bo.
Two questions after glancing over:
1)
I do wonder if abstracting "sgt" away from the name would make sense?
Like perhaps obj->mm.pages being the location of the new abstraction
so
naming it along the lines of i915_obj_pages or something.
Well it's not yet clear how this will end up. Really this should
develop into something along the lines of "struct i915_async_obj", on
Whole gigantic object struct will be needed for async free or for
something more than that?
I guess it depends on how an async free is supposed to work. For the
async migration, the plan is that when you migrate, for example between
LMEM and sys, we first unbind async and get a fence that signals when
unbinding is complete. The pages sg list will then be updated
immediately to point to sys, then the old memory in the form of a struct
ttm_resource will be freed when fences expire. It's on that ttm resource
we ideally would want the sg-table to sit, but we avoid that ATM due to
the awkward way those ttm resources were designed. But it's not a
super-huge object.
which the sg-list is a member only. Depending on how this turns out and
if it remains an sg-list I think your suggestion makes sense, but is it
something we can postpone for now?
...
2)
And how come obj->mm.pages remains? Does it go away later in follow
up work?
For the non-ttm backends, it's not yet implemented, so once they are
either moved to TTM or updated, we can completely replace obj-
mm.pages.
... sure, it's your project. I assume there is some time pressure then.
Yes, initially.
I was just asking since it looked a bit outside of the usual patterns
on a glance.
Oh one more question, how will it work for objects which migrate
between system and local memory? Depending on current placement either
obj->mm.pages or obj->mm.rsgt will be valid?
The contract currently is that obj->mm.pages is *always* valid.
Sometimes it points to the sg_table embedded in obj->mm.rsgt.
For anything that requires awareness of async migration, like upcoming
vma resources and error capture, they also need to be aware of
obj->mm.rsgt and handle refcounting accordingly. If it's NULL they can
safely assume async migration is not happening.
/Thomas
Regards,
Tvrtko