Re: [PATCH 1/2] drm/ttm: add a way to bo_wait for either the last read or last write

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/07/2011 03:24 PM, Alex Deucher wrote:
On Fri, Oct 7, 2011 at 4:00 AM, Thomas Hellstrom<thomas@xxxxxxxxxxxx>  wrote:
On 10/07/2011 12:42 AM, Marek Olšák wrote:
On Wed, Oct 5, 2011 at 7:54 AM, Thomas Hellstrom<thomas@xxxxxxxxxxxx>
  wrote:

In any case, I'm not saying fences is the best way to flush but since the
bo
code assumes that signaling a sync object means "make the buffer contents
available for CPU read / write", it's usually a good way to do it;
there's
even a sync_obj_flush() method that gets called when a potential flush
needs
to happen.

I don't think we use it like that. To my knowledge, the purpose of the
sync obj (to Radeon Gallium drivers at least) is to be able to wait
for the last use of a buffer. Whether the contents can or cannot be
available to the CPU is totally irrelevant.

Currently (and it's a very important performance optimization),
buffers stay mapped and available for CPU read/write during their
first map_buffer call. Unmap_buffer is a no-op. The unmapping happens
on buffer destruction. We only call bo_wait when we want to wait for
the GPU until it's done with the buffer (we don't always want that,
sometimes we want to use the unsychronized flag). Otherwise the
contents of buffers are available at *any time*.

We could probably implement bo_wait privately in the kernel driver and
not use ttm_bo_wait. I preferred code sharing though.

Textures (especially the tiled ones) are never mapped directly and a
temporary staging resource is used instead, so we don't actually
pollute address space that much. (in case you would have such a
remark) We will use staging resources for buffers too, but it's really
the last resort to avoid waiting when direct access can't be used.



2) Can't we say that a write_sync_obj is simply a sync_obj? What's the
difference between those two? I think we should remove the
write_sync_obj
bo
member.


Okay, but I think we should remove sync_obj instead, and keep write
and read sync objs. In the case of READWRITE usage, read_sync_obj
would be equal to write_sync_obj.



Sure, I'm fine with that.

One other thing, though, that makes me a little puzzled:

Let's assume you don't allow readers and writers at the same time, which
is
my perception of how read- and write fences should work; you either have
a
list of read fences or a single write fence (in the same way a read-write
lock works).

Now, if you only allow a single read fence, like in this patch. That
implies
that you can only have either a single read fence or a single write fence
at
any one time. We'd only need a single fence pointer on the bo, and
sync_obj_arg would tell us whether to signal the fence for read or for
write
(assuming that sync_obj_arg was set up to indicate read / write at
validation time), then the patch really isn't necessary at all, as it
only
allows a single read fence?

Or is it that you want to allow read- and write fences co-existing? In
that
case, what's the use case?

There are lots of read-write use cases which don't need any barriers
or flushing. The obvious ones are color blending and depth-stencil
buffering. The OpenGL application is also allowed to use a subrange of
a buffer as a vertex buffer (read-only) and another disjoint subrange
of the same buffer for transform feedback (write-only), which kinda
makes me think about whether we should track subranges instead of
treating a whole buffer as "busy". It gets even more funky with
ARB_shader_image_load_store, which supports atomic read-modify-write
operations on textures, not to mention atomic memory operations in
compute shaders (wait, isn't that also exposed in GL as
GL_ARB_shader_atomic_counters?).

I was thinking whether the two sync objs should be "read" and
"readwrite", or "read" and "write". I chose the latter, because it's
more fine-grained and we have to keep at least two of them around
anyway.

So now that you know what we use sync objs for, what are your ideas on
re-implementing that patch in a way that is okay with you? Besides
removing the third sync objs of course.

Marek

OK. First I think we need to make a distinction: bo sync objects vs driver
fences. The bo sync obj api is there to strictly provide functionality that
the ttm bo subsystem is using, and that follows a simple set of rules:

1) the bo subsystem does never assume sync objects are ordered. That means
the bo subsystem needs to wait on a sync object before removing it from a
buffer. Any other assumption is buggy and must be fixed. BUT, if that
assumption takes place in the driver unknowingly from the ttm bo subsystem
(which is usually the case), it's OK.

2) When the sync object(s) attached to the bo are signaled the ttm bo
subsystem is free to copy the bo contents and to unbind the bo.

3) The ttm bo system allows sync objects to be signaled in different ways
opaque to the subsystem using sync_obj_arg. The driver is responsible for
setting up that argument.

4) Driver fences may be used for or expose other functionality or adaptions
to APIs as long as the sync obj api exported to the bo sybsystem follows the
above rules.

This means the following w r t the patch.

A) it violates 1). This is a bug that must be fixed. Assumptions that if one
sync object is singnaled, another sync object is also signaled must be done
in the driver and not in the bo subsystem. Hence we need to explicitly wait
for a fence to remove it from the bo.

B) the sync_obj_arg carries *per-sync-obj* information on how it should be
signaled. If we need to attach multiple sync objects to a buffer object, we
also need multiple sync_obj_args. This is a bug and needs to be fixed.

C) There is really only one reason that the ttm bo subsystem should care
about multiple sync objects, and that is because the driver can't order them
efficiently. A such example would be hardware with multiple pipes reading
simultaneously from the same texture buffer. Currently we don't support this
so only the *last* sync object needs to be know by the bo subsystem. Keeping
track of multiple fences generates a lot of completely unnecessary code in
the ttm_bo_util file, the ttm_bo_vm file, and will be a nightmare if / when
we truly support pipelined moves.
Just an FYI, cayman GPUs have multiple rings and we will be working to
support them in the coming months.

Alex


Right. Nvidia has had it for a long time, but I think they use barriers ("Semaphores") to order sync objects when needed, so that only the last sync object is attached to the buffer.

If you need to / want to support simultaneous ring access to a buffer without such barriers, we should change the single buffer object fence pointer to a list of pointers to fence objects.

/Thomas




_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel



[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux