Quoting Chris Wilson (2017-10-28 12:27:23) > I should have admitted defeat long ago as there has been a rare but > persistent error on Sandybridge where semaphore signaling did not > propagate to the waiter, leading to a GPU hang. > > With the work on fence signaling for v4.9, the impact of using CPU driven > signaling was greatly reduced wrt to the latency of GPU semaphores, > though without logical rings support, the benefit of reordering work to > avoid bubbles is not realised (i.e. as it stands fence signaling is just > a slower, more costly version of HW semaphores; but works more > consistently). As a rough indicator of the difference, > > with semaphores: > Sequential (3 engines, 1 processes): average 5.470us per cycle [expected 4.988us] > > w/o semaphores: > Sequential (3 engines, 1 processes): average 15.771us per cycle [expected 4.923us] In comparison, v3.4: with semaphores: Sequential (3 engines, 1 processes): average 16.066us per cycle [expected 11.842us] w/o semaphores: Sequential (3 engines, 1 processes): average 23.460us per cycle [expected 11.839us] Interesting in that this microbenchmark doesn't show as big as an impact that drove adoption of semaphores (originally it gave ~3x better performance for x11perf), and that know even without semaphores we are faster than a few years ago. Further, since the split engines in Sandybridge userspace has learnt not to frequently jump between engines. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx