On Wed, Mar 23, 2022 at 02:44:17PM +0100, Christian König wrote: > Am 23.03.22 um 14:36 schrieb Daniel Vetter: > > On Wed, 23 Mar 2022 at 13:20, Christian König > > <ckoenig.leichtzumerken@xxxxxxxxx> wrote: > > > Am 23.03.22 um 12:59 schrieb Daniel Vetter: > > > > On Mon, Mar 21, 2022 at 02:25:56PM +0100, Christian König wrote: > > > > > This way we finally fix the problem that new resource are > > > > > not immediately evict-able after allocation. > > > > > > > > > > That has caused numerous problems including OOM on GDS handling > > > > > and not being able to use TTM as general resource manager. > > > > > > > > > > v2: stop assuming in ttm_resource_fini that res->bo is still valid. > > > > > v3: cleanup kerneldoc, add more lockdep annotation > > > > > v4: consistently use res->num_pages > > > > > > > > > > Signed-off-by: Christian König <christian.koenig@xxxxxxx> > > > > > Tested-by: Bas Nieuwenhuizen <bas@xxxxxxxxxxxxxxxxxxx> > > > > > +/** > > > > > + * struct ttm_lru_bulk_move > > > > > + * > > > > > + * @tt: first/last lru entry for resources in the TT domain > > > > > + * @vram: first/last lru entry for resources in the VRAM domain > > > > > + * > > > > > + * Helper structure for bulk moves on the LRU list. > > > > > + */ > > > > > +struct ttm_lru_bulk_move { > > > > > + struct ttm_lru_bulk_move_pos tt[TTM_MAX_BO_PRIORITY]; > > > > > + struct ttm_lru_bulk_move_pos vram[TTM_MAX_BO_PRIORITY]; > > > > Not really needed, just a thought: Should we track the associated dma_resv > > > > object here to make sure the locking is all done correctly (and also check > > > > that the bulk move bo have the same dma_resv)? It wouldn't really be any > > > > overhead for the !CONFIG_LOCKDEP case and we could sprinkle a lot more > > > > dma_resv_held all over the place. > > > You made a similar comment on the last revision and I already tried to > > > play around with that idea a bit. > > > > > > But I've completely abandoned that idea after realizing that the BOs in > > > the bulk move actually don't need to have the same dma_resv object, nor > > > do they all need to be locked. > > Uh how does that work? If you evict that bo while someone else is > > doing a bulk move, then at least the result might be rather funny, and > > I have no idea how it could work. > > The LRU is still protected by the common spinlock. > > So that will synchronize any modification to both the bulk move structure as > well as the individual list_heads making up the LRU. > > > > > Like even if you then make the rule that you have to lock all bos for > > the bulk move, the bo could still be moved independently, and that > > would again break the bulk lru properties. > > > > And if you do none of the above, there's no reason for that bo to have > > a distinct dma_resv. > > > > Like maybe the data structure wont fall apart, but semantically it > > just doesn't make any sense to me to allow this. What would you want > > to use this for? > > Yeah, that's a good point. > > It's not technically necessary as far as I can see, but I'm not sure if > there is a valid use case either. Yeah I think nothing obviously bad will happen to the list, but it will very much no longer be an LRU. Which kinda defeats the point - there's easier ways to not have an LRU like never updating anything :-) Aside, this is actually what i915-gem folks did without realizing, so that's also a bit why I think being strict here would be good. It's tricky and if you're too clever it's way too easy to end up with something which isn't anything useful anymore. > > > > It just happens that amdgpu is currently using it that way, but I can't > > > see any technical necessarily to restrict the bulk move like that. > > Yeah we can do that later on in a follow up patch, or I figure out why > > it's not a good idea :-) Just figured this might be good to lock down > > before other drivers start adopting this. > > I'm just wondering if it's really more defensive to restrict the handling > like that. > > On the other hand we can still lift the restriction when anybody really > comes along with a valid use case. Yeah if we do have a use-case we can lift this easily, it's just a few dma_resv_assert_held. I don't think we need this included for merging, but it'd be great as a follow-up patch while only amdgpu is using this. -Daniel > > Christian. > > > -Daniel > > > > > Regards, > > > Christian. > > > > > > > > > > -Daniel > > > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch