Op 11-10-12 21:26, Thomas Hellstrom schreef: > On 10/11/2012 08:42 PM, Maarten Lankhorst wrote: > >> >>> Anyway, if you plan to remove the fence lock and protect it with reserve, you must >>> make sure that a waiting reserve is never done in a destruction path. I think this >>> mostly concerns the nvidia driver. >> Well I don't think any lock should ever be held during destruction time, > > What I mean is, that *something* needs to protect the fence pointer. Currently it's the > fence lock, and I was assuming you'd protect it with reserve. And neither TTM nor > Nvidia should, when a resource is about to be freed, be forced to *block* waiting for > reserve just to access the fence pointer. When and if you have a solution that fulfills > those requirements, I'm ready to review it. It's not blocking, cleanup_refs_or_queue will toss it on the deferred list if reservation fails, behavior doesn't change just because I changed the order around. >> >>>> - no_wait_reserve is ignored if no_wait_gpu is false >>>> ttm_bo_reserve_locked can only return true if no_wait_reserve is true, but >>>> subsequently it will do a wait_unreserved if no_wait_gpu is false. >>>> I'm planning on removing this argument and act like it is always true, since >>>> nothing on the lru list should fail to reserve currently. >>> Yes, since all buffers that are reserved are removed from the LRU list, there >>> should never be a waiting reserve on them, so no_wait_reserve can be removed >>> from ttm_mem_evict_first, ttm_bo_evict and possibly other functions in the call chain. >> I suppose there will stay a small race though, > > Hmm, where? When you enter the ddestroy path, you drop the lock and hope the buffer doesn't reserved away from under you. >>>> - effectively unlimited callchain between some functions that all go through >>>> ttm_mem_evict_first: >>>> >>>> /------------------------\ >>>> ttm_mem_evict_first - ttm_bo_evict - -ttm_bo_mem_space - ttm_bo_mem_force_space - ttm_mem_evict_first >>>> \ ttm_bo_handle_move_mem / >>>> I'm not surprised that there was a deadlock before, it seems to me it would >>>> be pretty suicidal to ever do a blocking reserve on any of those lists, >>>> lockdep would be all over you for this. >>> Well, at first this may look worse than it actually is. The driver's eviction memory order determines the recursion depth >>> and typically it's 0 or 1, since subsequent ttm_mem_evict_first should never touch the same LRU lists as the first one. >>> What would typically happen is that a BO is evicted from VRAM to TT, and if there is no space in TT, another BO is evicted >>> to system memory, and the chain is terminated. However a driver could set up any eviction order but that would be >>> a BUG. >>> >>> But in essence, as you say, even with a small recursion depth, a waiting reserve could cause a deadlock. >>> But there should be no waiting reserves in the eviction path currently. >> Partially true, ttm_bo_cleanup_refs is currently capable of blocking reserve. >> Fixing this might mean that ttm_mem_evict_first may need to become more aggressive, >> since it seems all the callers of this function assume that ttm_mem_evict_first can only fail >> if there is really nothing more to free and blocking nested would really upset lockdep >> and leave you open to the same deadlocks. > I can't see how the waiting reserve in ttm_bo_cleanup_refs would cause a deadlock, > because the buffer about to be reserved is always *last* in a reservation sequence, and the > reservation is always released (or the buffer destroyed) before trying to reserve another buffer. > Technically the buffer is not looked up from a LRU list but from the delayed delete list. > Could you describe such a deadlock case? The only interesting case for this is ttm_mem_evict_first, and while it may not technically be a deadlock, lockdep will flag you for blocking on this anyway, since the only reason it would not be a deadlock is if you know the exact semantics of why. >> An unexpected bonus from adding this skipping would be that the atomic lru_list >> removal on unreserve guarantee becomes a lot easier to drop while keeping behavior >> exactly the same. Only the swapout code would need to be adjusted for to try the whole >> list. Maybe ttm_bo_delayed_delete with remove_all = false too would change behavior >> slightly too, but this seems to be less likely, as it could only ever happen on delayed >> destruction of imported dma-buf's. > > > May I kindly remind you that the atomic lru_list removal stays in TTM > until we've had a high level design discussion weighing it against an extended > reservation object API. I'm aware, but I still wanted to see if it was possible or not in a clean way for testing, so I don't ask for things that would be impossible to do or too involved to do in a safe manner. Will you make it to UDS-r? ~Maarten _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel