On Fri, Jun 19, 2015 at 06:37:10PM +0100, Arun Siluvery wrote: > Some of the WA are to be applied during context save but before restore and > some at the end of context save/restore but before executing the instructions > in the ring, WA batch buffers are created for this purpose and these WA cannot > be applied using normal means. Each context has two registers to load the > offsets of these batch buffers. If they are non-zero, HW understands that it > need to execute these batches. > > v1: In this version two separate ring_buffer objects were used to load WA > instructions for indirect and per context batch buffers and they were part > of every context. > > v2: Chris suggested to include additional page in context and use it to load > these WA instead of creating separate objects. This will simplify lot of things > as we need not explicity pin/unpin them. Thomas Daniel further pointed that GuC > is planning to use a similar setup to share data between GuC and driver and > WA batch buffers can probably share that page. However after discussions with > Dave who is implementing GuC changes, he suggested to use an independent page > for the reasons - GuC area might grow and these WA are initialized only once and > are not changed afterwards so we can share them share across all contexts. > > The page is updated with WA during render ring init. This has an advantage of > not adding more special cases to default_context. > > We don't know upfront the number of WA we will applying using these batch buffers. > For this reason the size was fixed earlier but it is not a good idea. To fix this, > the functions that load instructions are modified to report the no of commands > inserted and the size is now calculated after the batch is updated. A macro is > introduced to add commands to these batch buffers which also checks for overflow > and returns error. > We have a full page dedicated for these WA so that should be sufficient for > good number of WA, anything more means we have major issues. > The list for Gen8 is small, same for Gen9 also, maybe few more gets added > going forward but not close to filling entire page. Chris suggested a two-pass > approach but we agreed to go with single page setup as it is a one-off routine > and simpler code wins. > > One additional option is offset field which is helpful if we would like to > have multiple batches at different offsets within the page and select them > based on some criteria. This is not a requirement at this point but could > help in future (Dave). > > Chris provided some helpful macros and suggestions which further simplified > the code, they will also help in reducing code duplication when WA for > other Gen are added. Add detailed comments explaining restrictions. > > (Many thanks to Chris, Dave and Thomas for their reviews and inputs) > > Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Cc: Dave Gordon <david.s.gordon@xxxxxxxxx> > Signed-off-by: Rafael Barbalho <rafael.barbalho@xxxxxxxxx> > Signed-off-by: Arun Siluvery <arun.siluvery@xxxxxxxxxxxxxxx> Sigh, after all that, I found one minor thing, but nevertheless Reviewed-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > +#define wa_ctx_emit(batch, cmd) { \ > + if (WARN_ON(index >= (PAGE_SIZE / sizeof(uint32_t)))) { \ > + return -ENOSPC; \ > + } \ > + batch[index++] = (cmd); \ > + } We should have wrapped this in do { } while(0) - think of all those trialing semicolons we have in the code! Fortunately we haven't used this in a if (foo) wa_ctx_emit(bar); else wa_ctx_emit(baz); yet. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx