On 11/28/2010 05:11 PM, Francisco Jerez wrote:
Francisco Jerez<currojerez@xxxxxxxxxx> writes:
Thomas Hellstrom<thomas@xxxxxxxxxxxx> writes:
Ben,
I'm looking at a way to make TTM memory management asynchronous with
the CPU. The idea is that you should basically be able to DMA data to
and from memory regions without waiting for idle, as long as the GPU
has a means to provide operation ordering.
Sounds good. I guess you're mainly dealing with BO eviction
synchronization? The only problem I see on our side is that calls to our
move() hook aren't guaranteed to be carried out in order (because of the
multiple hardware channels). I'm thinking that move() could be extended
with an optional sync_obj argument, that way move() would be able to
make sure that evictions are strictly ordered with respect to the fence
specified.
The way evictions will work is that they appear to take place
"instantly", but are scheduled on a channel, and there will be a data
structure that keeps track about what fences need to be signaled before
a managed area can be reused.
The driver will need to provide a function that, given a list of fences,
returns a fence that when it signals, guarantees that all other fences
in the list have signaled.
Single-channel hardware will just return the fence with the highest
sequence. Multi-channel hardware may need to insert command stream
barriers if available and create a new sync object to return or resort
to simply waiting to determine which fence signals last.
I guess Nouveau can do command stream barriers, (waiting for other
channels to reach a certain command before progressing?)
Needless to say, drivers need not activate async operation if they don't
want to, but for single-channel hardware it will hopefully be very simple.
While doing that I looked a bit at the Nouveau fencing. It appears
like waiting for fences is polling only (no irq to signal fences)? Is
that correct?
That's right, nvidia hardware has no nice way to schedule a fence-like
interrupt we could selectively turn on and off around the sync_obj_wait
hook. There's a bunch of (more or less) chipset-specific hacks that
could be used to get an equivalent effect, but polling has seemed good
enough so far (in the typical case we only take the "lazy" path so CPU
usage is still OK).
Indeed, I saw the same with unichromes. lazy for throttling and not lazy
for other waits, although I ended up with a hrtimer polling loop in the
non-lazy case, since software fallbacks tended to eat a lot of CPU while
waiting for buffer idle.
/Thomas
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel