On 10/13/21 16:41, Daniel Vetter wrote:
On Fri, Oct 08, 2021 at 03:35:26PM +0200, Thomas Hellström wrote:
As we start to introduce asynchronous failsafe object migration,
where we update the object state and then submit asynchronous
commands we need to record what memory resources are actually used
by various part of the command stream. Initially for three purposes:
1) Error capture.
2) Asynchronous migration error recovery.
3) Asynchronous vma bind.
At the time where these happens, the object state may have been updated
to be several migrations ahead and object sg-tables discarded.
In order to make it possible to keep sg-tables with memory resource
information for these operations, introduce refcounted sg-tables that
aren't freed until the last user is done with them.
The alternative would be to reference information sitting on the
corresponding ttm_resources which typically have the same lifetime as
these refcountes sg_tables, but that leads to other awkward constructs:
Due to the design direction chosen for ttm resource managers that would
lead to diamond-style inheritance, the LMEM resources may sometimes be
prematurely freed, and finally the subclassed struct ttm_resource would
have to bleed into the asynchronous vma bind code.
On the diamon inheritence I was pondering some more whether we shouldn't
just do the classic C union horrors, i.e.
struct ttm_resource {
/* stuff */
};
struct ttm_drm_mm_resource {
struct ttm_resource base;
struct drm_mm_node;
};
struct ttm_buddy_resource {
struct ttm_resource base;
struct drm_buddy_node;
};
Whatever else we have, maybe also integer resources for guc_id.
And then the horrors:
struct i915_gem_resource {
union {
struct ttm_resource base;
struct ttm_drm_mm_resource drm_mm;
struct ttm_buffer_object buddy;
};
/* i915 stuff */
};
BUILD_BUG_ON(offsetof(struct i915_gem_resource, base) ==
offsetof(struct i915_gem_resource, drmm_mm.base))
BUILD_BUG_ON(offsetof(struct i915_gem_resource, base) ==
offsetof(struct i915_gem_resource, buddy.base))
This is horrible, but also in official C89 and later unions are the only
ways to do inheritance. The only reason we can do different in linux is
because we compile with strict aliasing turned off.
So I think we can shrug this off as officially sanctioned horrors. There's
a small downside with overhead maybe, but I don't think the amount in
difference between the various allocators is big enough that we should
care. Plus a pointer to driver stuff to resolve the diamond inheritance
through different means isn't free either.
Yes, this is exactly what was meant by "awkward constructs" in the
commit message,
My thoughts are still that all this could be avoided by a different
design for struct ttm_resource,
but I agree we can do with refcounted sg-lists for now, to see where
this ends up when all related resource-on-lru stuff lands in TTM.
/Thomas