On Wed, 2023-10-04 at 14:44 +0200, Christian König wrote: > Am 04.10.23 um 09:17 schrieb Thomas Hellström: > > On Wed, 2023-10-04 at 03:52 +0000, Zeng, Oak wrote: > > > Hi Christian, > > > > > > As a follow up to this thread: > > > https://www.spinics.net/lists/dri-devel/msg410740.html, I started > > > the > > > work of moving the lru out of ttm_resource_manager and make it a > > > common library for both ttm and svm. While look into the details > > > of > > > the bulk_move in ttm resource manager, I found a potential > > > problem: > > > > > > For simplicity, let’s say we only have one memory type and one > > > priority, so ttm resource manager only maintains one global lru > > > list. > > > Let’s say this list has 10 nodes, node1 to node10. > > > > > > But the lru_bulk_move is per vm. Let’s say vm1 has a bulk_move > > > covering node range [node4, node7] and vm2 has a bulk_move > > > covering > > > node range [node6, node9]. Notice those two range has an overlap. > > > Since two vm can simultaneously add nodes to lru, I think this > > > scenario can happen. > > That can't happen. See what ttm_resource_move_to_lru_tail() does when > the BO has a bulk move associated with it. > > > > > > > Now if we perform a bulk move for vm1, moving [node4, node7] to > > > the > > > tail of the lru list. The lru after this bulk move will be: > > > node1, > > > node2, node3,node8,node9, node10, node4, node5, node6, node7. Now > > > notice that for vm2’s bulk_move, the first pointer (pointing to > > > node6) is actually after the last pointer (pointing to node9), > > > which > > > doesn’t make sense. > > > > > > Is this a real problem? As I understand it, with this issue, we > > > only > > > mess up the lru list order, but there won’t be any functional > > > problem. If it is a real problem, should we make the bulk_move > > > global > > > instead of per vm based? > > > > > > Thanks, > > > Oak > > > > > FWIW I have a patch set that converts the TTM bulk move code to > > using > > sublists; a list item is either a resource or a sublist, and when > > performing a bulk move essentially the sublist is moved. Bumping > > resource LRU within a VM would touch only the sublist. > > That sounds like my very first attempt at bulk moves which we > abandoned > for various reasons. > > That's easily >5years ago, but the history of that should still be on > the mailing list if I'm not completely mistaken. This here? https://lists.freedesktop.org/archives/amd-gfx/2018-August/025016.html No, in that case it's very different. Or is it an even earlier version? /Thomas > > Regards, > Christian. > > > > > Currently functionality and TTM API is essentially the same but > > when > > experimenting with LRU traversal for exhaustive WW-locking eviction > > this concept was easier to use. Also hopefully this would reduce > > fragility and improve understanding since a scenario like the above > > could really never happen... > > > > Let me know if I should send it out as an RFC. > > > > Code is here: > > https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/351/commits > > > > /Thomas > > > > > > > > > > >