On Sun, Aug 28, 2016 at 09:33:54PM +0100, Chris Wilson wrote: > On Sun, Aug 28, 2016 at 05:37:47PM +0100, Chris Wilson wrote: > > Currently we install a callback for performing poll on a dma-buf, > > irrespective of the timeout. This involves taking a spinlock, as well as > > unnecessary work, and greatly reduces scaling of poll(.timeout=0) across > > multiple threads. > > > > We can query whether the poll will block prior to installing the > > callback to make the busy-query fast. > > > > Single thread: 60% faster > > 8 threads on 4 (+4 HT) cores: 600% faster > > Hmm, this only really applies to the idle case. > reservation_object_test_signaled_rcu() is still a major bottleneck when > busy, due to the dance inside reservation_object_test_signaled_single() The fix is not difficult, just requires extending the seqlock to catch the RCU race (i.e. earlier patches). I'll resend that series in the morning. -Chris -- Chris Wilson, Intel Open Source Technology Centre -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html