On Thu, Jun 3, 2021 at 5:48 AM Matthew Brost <matthew.brost@xxxxxxxxx> wrote: > > On Wed, Jun 02, 2021 at 08:57:02PM +0200, Daniel Vetter wrote: > > On Wed, Jun 2, 2021 at 5:27 PM Tvrtko Ursulin > > <tvrtko.ursulin@xxxxxxxxxxxxxxx> wrote: > > > On 25/05/2021 17:45, Matthew Brost wrote: > > > > On Tue, May 25, 2021 at 11:32:26AM +0100, Tvrtko Ursulin wrote: > > > >> * Context pinning code with it's magical two adds, subtract and cmpxchg is > > > >> dodgy as well. > > > > > > > > Daniele tried to remove this and it proved quite difficult + created > > > > even more races in the backend code. This was prior to the pre-pin and > > > > post-unpin code which makes this even more difficult to fix as I believe > > > > these functions would need to be removed first. Not saying we can't > > > > revisit this someday but I personally really like it - it is a clever > > > > way to avoid reentering the pin / unpin code while asynchronous things > > > > are happening rather than some complex locking scheme. Lastly, this code > > > > has proved incredibly stable as I don't think we've had to fix a single > > > > thing in this area since we've been using this code internally. > > > > > > Pretty much same as above. The code like: > > > > > > static inline void __intel_context_unpin(struct intel_context *ce) > > > { > > > if (!ce->ops->sched_disable) { > > > __intel_context_do_unpin(ce, 1); > > > } else { > > > while (!atomic_add_unless(&ce->pin_count, -1, 1)) { > > > if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1) { > > > ce->ops->sched_disable(ce); > > > break; > > > } > > > } > > > } > > > } > > > > > > That's pretty much impenetrable for me and the only thing I can think of > > > here is **ALARM** must be broken! See what others think.. > > Yea, probably should add a comment: > > /* > * If the context has the sched_disable function, it isn't safe to unpin > * until this function completes. This function is allowed to complete > * asynchronously too. To avoid this function from being entered twice > * and move ownership of the unpin to this function's completion, adjust > * the pin count to 2 before it is entered. When this function completes > * the context can call intel_context_sched_unpin which decrements the > * pin count by 2 potentially resulting in an unpin. > * > * A while loop is needed to ensure the atomicity of the pin count. e.g. > * The below if / else statement has a race: > * > * if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1) > * ce->ops->sched_disable(ce); > * else > * atomic_dec(ce, 1); > * > * Two threads could simultaneously fail the if clause resulting in the > * pin_count going to 0 with scheduling enabled + the context pinned. > */ > > > > > pin_count is a hand-rolled mutex, except not actually a real one, and > > it's absolutely hiliarous in it's various incarnations (there's one > > each on i915_vm, vma, obj and probably a few more). > > > > Not the worst one I've seen by far in the code we've merged already. > > Minimally this needs a comment here and in the struct next to > > @pin_count to explain where all this is abused, which would already > > make it better than most of the in-tree ones. > > > > As part of the ttm conversion we have a plan to sunset the "pin_count > > as a lock" stuff, depending how bad that goes we might need to split > > up the task for each struct that has such a pin_count. > > > > Didn't know that with the TTM rework this value might go away. If that > is truely the direction I don't see the point in reworking this now. It > 100% works and with a comment I think it can be understood what it is > doing. Well not go away, but things will change. Currently the various ->pin_count sprinkled all over the place have essentially two uses - pinning stuff long term (scanout, ctxs, anything that stays pinned after the ioctl is done essentially) - short-term lock-like construct There's going to be two changes: - The short-term pins will be replaced by dma_resv_lock/unlock pairs - the locking rules for long-term pins will change, because we'll require that you must hold dma_resv_lock for unpinning. So no more atomic_t, also no more races for final unpins vs cleanup work Also now that you've explained the why for this dance, especially the async part: Since the new unpin will hold dma_resv_lock, we can create&attach dma_fence for tracking async completion which then the next operation can wait on. The awkward state we have right now is that there's a lot of places where we require the unpin to be done locklessly with these atomic tricks, so there's going to be quite some surgery involved all over the code. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch