On Fri, Oct 7, 2011 at 4:00 AM, Thomas Hellstrom <thomas@xxxxxxxxxxxx> wrote: > On 10/07/2011 12:42 AM, Marek Olšák wrote: >> >> On Wed, Oct 5, 2011 at 7:54 AM, Thomas Hellstrom<thomas@xxxxxxxxxxxx> >> wrote: >> >>> >>> In any case, I'm not saying fences is the best way to flush but since the >>> bo >>> code assumes that signaling a sync object means "make the buffer contents >>> available for CPU read / write", it's usually a good way to do it; >>> there's >>> even a sync_obj_flush() method that gets called when a potential flush >>> needs >>> to happen. >>> >> >> I don't think we use it like that. To my knowledge, the purpose of the >> sync obj (to Radeon Gallium drivers at least) is to be able to wait >> for the last use of a buffer. Whether the contents can or cannot be >> available to the CPU is totally irrelevant. >> >> Currently (and it's a very important performance optimization), >> buffers stay mapped and available for CPU read/write during their >> first map_buffer call. Unmap_buffer is a no-op. The unmapping happens >> on buffer destruction. We only call bo_wait when we want to wait for >> the GPU until it's done with the buffer (we don't always want that, >> sometimes we want to use the unsychronized flag). Otherwise the >> contents of buffers are available at *any time*. >> >> We could probably implement bo_wait privately in the kernel driver and >> not use ttm_bo_wait. I preferred code sharing though. >> >> Textures (especially the tiled ones) are never mapped directly and a >> temporary staging resource is used instead, so we don't actually >> pollute address space that much. (in case you would have such a >> remark) We will use staging resources for buffers too, but it's really >> the last resort to avoid waiting when direct access can't be used. >> >> >> >>>>> >>>>> 2) Can't we say that a write_sync_obj is simply a sync_obj? What's the >>>>> difference between those two? I think we should remove the >>>>> write_sync_obj >>>>> bo >>>>> member. >>>>> >>>>> >>>> >>>> Okay, but I think we should remove sync_obj instead, and keep write >>>> and read sync objs. In the case of READWRITE usage, read_sync_obj >>>> would be equal to write_sync_obj. >>>> >>>> >>>> >>> >>> Sure, I'm fine with that. >>> >>> One other thing, though, that makes me a little puzzled: >>> >>> Let's assume you don't allow readers and writers at the same time, which >>> is >>> my perception of how read- and write fences should work; you either have >>> a >>> list of read fences or a single write fence (in the same way a read-write >>> lock works). >>> >>> Now, if you only allow a single read fence, like in this patch. That >>> implies >>> that you can only have either a single read fence or a single write fence >>> at >>> any one time. We'd only need a single fence pointer on the bo, and >>> sync_obj_arg would tell us whether to signal the fence for read or for >>> write >>> (assuming that sync_obj_arg was set up to indicate read / write at >>> validation time), then the patch really isn't necessary at all, as it >>> only >>> allows a single read fence? >>> >>> Or is it that you want to allow read- and write fences co-existing? In >>> that >>> case, what's the use case? >>> >> >> There are lots of read-write use cases which don't need any barriers >> or flushing. The obvious ones are color blending and depth-stencil >> buffering. The OpenGL application is also allowed to use a subrange of >> a buffer as a vertex buffer (read-only) and another disjoint subrange >> of the same buffer for transform feedback (write-only), which kinda >> makes me think about whether we should track subranges instead of >> treating a whole buffer as "busy". It gets even more funky with >> ARB_shader_image_load_store, which supports atomic read-modify-write >> operations on textures, not to mention atomic memory operations in >> compute shaders (wait, isn't that also exposed in GL as >> GL_ARB_shader_atomic_counters?). >> >> I was thinking whether the two sync objs should be "read" and >> "readwrite", or "read" and "write". I chose the latter, because it's >> more fine-grained and we have to keep at least two of them around >> anyway. >> >> So now that you know what we use sync objs for, what are your ideas on >> re-implementing that patch in a way that is okay with you? Besides >> removing the third sync objs of course. >> >> Marek >> > > OK. First I think we need to make a distinction: bo sync objects vs driver > fences. The bo sync obj api is there to strictly provide functionality that > the ttm bo subsystem is using, and that follows a simple set of rules: > > 1) the bo subsystem does never assume sync objects are ordered. That means > the bo subsystem needs to wait on a sync object before removing it from a > buffer. Any other assumption is buggy and must be fixed. BUT, if that > assumption takes place in the driver unknowingly from the ttm bo subsystem > (which is usually the case), it's OK. > > 2) When the sync object(s) attached to the bo are signaled the ttm bo > subsystem is free to copy the bo contents and to unbind the bo. > > 3) The ttm bo system allows sync objects to be signaled in different ways > opaque to the subsystem using sync_obj_arg. The driver is responsible for > setting up that argument. > > 4) Driver fences may be used for or expose other functionality or adaptions > to APIs as long as the sync obj api exported to the bo sybsystem follows the > above rules. > > This means the following w r t the patch. > > A) it violates 1). This is a bug that must be fixed. Assumptions that if one > sync object is singnaled, another sync object is also signaled must be done > in the driver and not in the bo subsystem. Hence we need to explicitly wait > for a fence to remove it from the bo. > > B) the sync_obj_arg carries *per-sync-obj* information on how it should be > signaled. If we need to attach multiple sync objects to a buffer object, we > also need multiple sync_obj_args. This is a bug and needs to be fixed. > > C) There is really only one reason that the ttm bo subsystem should care > about multiple sync objects, and that is because the driver can't order them > efficiently. A such example would be hardware with multiple pipes reading > simultaneously from the same texture buffer. Currently we don't support this > so only the *last* sync object needs to be know by the bo subsystem. Keeping > track of multiple fences generates a lot of completely unnecessary code in > the ttm_bo_util file, the ttm_bo_vm file, and will be a nightmare if / when > we truly support pipelined moves. Just an FYI, cayman GPUs have multiple rings and we will be working to support them in the coming months. Alex > > As I understand it from your patches, you want to keep multiple fences > around only to track rendering history. If we want to do that generically, i > suggest doing it in the execbuf util code in the following way: > > struct ttm_eu_rendering_history { > void *last_read_sync_obj; > void *last_read_sync_obj_arg; > void *last_write_sync_obj; > void *last_write_sync_obj_arg; > } > > Embed this structure in the radeon_bo, and build a small api around it, > including *optionally* passing it to the existing execbuf utilities, and you > should be done. The bo_util code and bo_vm code doesn't care about the > rendering history. Only that the bo is completely idle. > > Note also that when an accelerated bo move is scheduled, the driver needs to > update this struct > > /Thomas > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > dri-devel mailing list > dri-devel@xxxxxxxxxxxxxxxxxxxxx > http://lists.freedesktop.org/mailman/listinfo/dri-devel > _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel