Re: [PATCH 3/6] drm/ttm: call ttm_bo_cleanup_refs with reservation and lru lock held

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/28/2012 03:46 PM, Maarten Lankhorst wrote:
Op 28-11-12 15:23, Thomas Hellstrom schreef:
On 11/28/2012 02:55 PM, Maarten Lankhorst wrote:
Op 28-11-12 14:21, Thomas Hellstrom schreef:
On 11/28/2012 01:15 PM, Maarten Lankhorst wrote:
Op 28-11-12 12:54, Thomas Hellstrom schreef:
On 11/28/2012 12:25 PM, Maarten Lankhorst wrote:
By removing the unlocking of lru and retaking it immediately, a race is
removed where the bo is taken off the swap list or the lru list between
the unlock and relock. As such the cleanup_refs code can be simplified,
it will attempt to call ttm_bo_wait non-blockingly, and if it fails
it will drop the locks and perform a blocking wait, or return an error
if no_wait_gpu was set.

The need for looping is also eliminated, since swapout and evict_mem_first
will always follow the destruction path, so no new fence is allowed
to be attached. As far as I can see this may already have been the case,
but the unlocking / relocking required a complicated loop to deal with
re-reservation.

The downside is that ttm_bo_cleanup_memtype_use is no longer called with
reservation held, so drivers must be aware that move_notify with a null
parameter doesn't require a reservation.
Why can't we unreserve *after* ttm_bo_cleanup_memtype_use? That's not
immediately clear from this patch.
Because we would hold the reservation while waiting and with the object still
on swap and lru lists still, that would defeat the whole purpose of keeping
the object on multiple lists, plus break current code that assumes bo's on the
those lists can always be reserved.

the if (ret && !no_wait_gpu) path has to drop the reservation and lru lock, and
isn't guaranteed to be able to retake it. Maybe it could be guaranteed now, but
I'm sure that would not be the case if the reservations were shared across
devices.
The evict path removes the BO from the LRU lists, drops the LRU lock but hangs on to the reservation,
and in case the wait goes wrong, re-adds the bo to the LRU lists and returns an error.
If you really want to, we could hang on to the !no_wait_gpu path, wait shouldn't ever fail there, so I suppose
leaving it off the lru lists and not re-add on any list in case of wait fail is fine. It's still on the ddestroy list in that
case, so not adding it back to the other lists is harmless.

Well I'm a bit afraid that theoretically, other callers may have a bo reserved, while cleanup_refs_and_unlock
more or less runs the whole destroy path on that buffer. Sure, we have control over those other reservers,
but it may come back and bite us.
That's why initially I moved all the destruction to ttm_bo_release_list, to have all destruction in
only 1 place. But even now it's serialized with the lru lock, while the destruction may not happen
right away, it still happens before last list ref to the bo is dropped.

But it's your call, just choose the approach you want and I'll resubmit this. :-)

Also the wait might fail if a signal is hit, so it's definitely possible, and even likely in the case of the X server process.

Anyway, I prefer if we could try to keep the reservation across the ttm_cleanup_memtype_use  function, and as far
as I can tell, the only thing preventing that is the reservation release in the (!no_wait_gpu) path. So if we alter that to
do the same as the evict path I think without looking to deeply into the consequences that we should be safe.
I think returning success early without removing off ddestroy list if re-reserving fails
with lru lock held would be better.

We completed the wait and attempt to reserve the bo, which failed. Without the lru
lock atomicity, this can't happen since the only places that would do it call this with
the lru lock held.

With the atomicity removal, the only place that could do this is ttm_bo_delayed_delete
with remove_all set to true. And even if that happened the destruction code would run
*anyway* since we completed the waiting part already, it would just not necessarily be
run from this thread, but that guarantee didn't exist anyway.
Then we should be able to skip patch 2 as well.
If my tryreserve approach sounds sane, second patch should still be skippable. :-)

Sure, Lets go for that approach.


~Maarten
/Thomas

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux