Re: bulk_move in ttm_resource manager

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 04.10.23 um 09:17 schrieb Thomas Hellström:
On Wed, 2023-10-04 at 03:52 +0000, Zeng, Oak wrote:
Hi Christian,
As a follow up to this thread:
https://www.spinics.net/lists/dri-devel/msg410740.html, I started the
work of moving the lru out of ttm_resource_manager and make it a
common library for both ttm and svm. While look into the details of
the bulk_move in ttm resource manager, I found a potential problem:
For simplicity, let’s say we only have one memory type and one
priority, so ttm resource manager only maintains one global lru list.
Let’s say this list has 10 nodes, node1 to node10.
But the lru_bulk_move is per vm. Let’s say vm1 has a bulk_move
covering node range [node4, node7] and vm2 has a bulk_move covering
node range [node6, node9]. Notice those two range has an overlap.
Since two vm can simultaneously add nodes to lru, I think this
scenario can happen.

That can't happen. See what ttm_resource_move_to_lru_tail() does when the BO has a bulk move associated with it.

Now if we perform a bulk move for vm1, moving [node4, node7] to the
tail of the lru list. The lru after this bulk move will be: node1,
node2, node3,node8,node9, node10, node4, node5, node6, node7. Now
notice that for vm2’s bulk_move, the first pointer  (pointing to
node6) is actually after the last pointer (pointing to node9), which
doesn’t make sense.
Is this a real problem? As I understand it, with this issue, we only
mess up the lru list order, but there won’t be any functional
problem. If it is a real problem, should we make the bulk_move global
instead of per vm based?
Thanks,
Oak
FWIW I have a patch set that converts the TTM bulk move code to using
sublists; a list item is either a resource or a sublist, and when
performing a bulk move essentially the sublist is moved. Bumping
resource LRU within a VM would touch only the sublist.

That sounds like my very first attempt at bulk moves which we abandoned for various reasons.

That's easily >5years ago, but the history of that should still be on the mailing list if I'm not completely mistaken.

Regards,
Christian.


Currently functionality and TTM API is essentially the same but when
experimenting with LRU traversal for exhaustive WW-locking eviction
this concept was easier to use. Also hopefully this would reduce
fragility and improve understanding since a scenario like the above
could really never happen...

Let me know if I should send it out as an RFC.

Code is here:
https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/351/commits

/Thomas









[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux