On Wed, 2023-10-04 at 03:52 +0000, Zeng, Oak wrote: > Hi Christian, > > As a follow up to this thread: > https://www.spinics.net/lists/dri-devel/msg410740.html, I started the > work of moving the lru out of ttm_resource_manager and make it a > common library for both ttm and svm. While look into the details of > the bulk_move in ttm resource manager, I found a potential problem: > > For simplicity, let’s say we only have one memory type and one > priority, so ttm resource manager only maintains one global lru list. > Let’s say this list has 10 nodes, node1 to node10. > > But the lru_bulk_move is per vm. Let’s say vm1 has a bulk_move > covering node range [node4, node7] and vm2 has a bulk_move covering > node range [node6, node9]. Notice those two range has an overlap. > Since two vm can simultaneously add nodes to lru, I think this > scenario can happen. > > Now if we perform a bulk move for vm1, moving [node4, node7] to the > tail of the lru list. The lru after this bulk move will be: node1, > node2, node3,node8,node9, node10, node4, node5, node6, node7. Now > notice that for vm2’s bulk_move, the first pointer (pointing to > node6) is actually after the last pointer (pointing to node9), which > doesn’t make sense. > > Is this a real problem? As I understand it, with this issue, we only > mess up the lru list order, but there won’t be any functional > problem. If it is a real problem, should we make the bulk_move global > instead of per vm based? > > Thanks, > Oak > FWIW I have a patch set that converts the TTM bulk move code to using sublists; a list item is either a resource or a sublist, and when performing a bulk move essentially the sublist is moved. Bumping resource LRU within a VM would touch only the sublist. Currently functionality and TTM API is essentially the same but when experimenting with LRU traversal for exhaustive WW-locking eviction this concept was easier to use. Also hopefully this would reduce fragility and improve understanding since a scenario like the above could really never happen... Let me know if I should send it out as an RFC. Code is here: https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/351/commits /Thomas