On 04/04/16 20:07, Chris Wilson wrote:
On Mon, Apr 04, 2016 at 05:51:11PM +0100, Tvrtko Ursulin wrote:
From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
On platforms with multiple forcewake domains it seems more efficient
to request all desired ones and then to wait for acks to avoid
needlessly serializing on each domain.
Not convinced since we have more machines with one domain than two. What
I did was to compact the domains array so that we only iterated over the
known set - but that feels overkill when we only have two domains today.
For the same reason (only one machine with two domains), I didn't think
seperate functions to iterate over one domain and another to iterate
over all was worth it.
What you can do though is remove an excess posting read from
fw_domains_put.
Compared to the cost of a register access (the spinlock irq mostly) the
iterator doesn't strike me as being that worthwhile an optimisation
target.
Correct, I thought we agreed that the majority of the CPU time
attributed to fw_domains_get is from the busy spinning while waiting on
the ack from the GPU.
This patch is not optimising the iterator, but requests all domains to
be woken up and then waits for acks. It changes the time spent busy
spinning from Td1 + ... + Td2 to max(Td1...Tdn).
Yes it is only interesting for platforms with more than one fw domain.
But since we agreed iterator is not significant, the fact that it adds
two loops* over the array should not be noticeable vs. the gain for
multi-fw domain machines (which will be more and more of as time goes by).
Regards,
Tvrtko
* Also because 2/3 from this serious has shrunk the iterator
considerably, even with two iterations fw_domains_get remains pretty
much the same size now with two loops, vs one loop before it.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx