On Sat, Dec 13, 2014 at 7:08 PM, Ben Widawsky <benjamin.widawsky@xxxxxxxxx> wrote: > Any GEM driver which has very large objects and a slow CPU is subject to very > long waits simply for clflushing incoherent objects. Generally, each individual > object is not a problem, but if you have very large objects, or very many > objects, the flushing begins to show up in profiles. Because on x86 we know the > cache size, we can easily determine when an object will use all the cache, and > forego iterating over each cacheline. > > We need to be careful when using wbinvd. wbinvd() is itself potentially slow > because it requires synchronizing the flush across all CPUs so they have a > coherent view of memory. This can result in either stalling work being done on > other CPUs, or this call itself stalling while waiting for a CPU to accept the > interrupt. Also, wbinvd() also has the downside of invalidating all cachelines, > so we don't want to use it unless we're sure we already own most of the > cachelines. > > The current algorithm is very naive. I think it can be tweaked more, and it > would be good if someone else gave it some thought. I am pretty confident in > i915, we can even skip the IPI in the execbuf path with minimal code change (or > perhaps just some verifying of the existing code). It would be nice to hear what > other developers who depend on this code think. > > Cc: Intel GFX <intel-gfx@xxxxxxxxxxxxxxxxxxxxx> > Signed-off-by: Ben Widawsky <ben@xxxxxxxxxxxx> > --- > drivers/gpu/drm/drm_cache.c | 20 +++++++++++++++++--- > 1 file changed, 17 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c > index d7797e8..6009c2d 100644 > --- a/drivers/gpu/drm/drm_cache.c > +++ b/drivers/gpu/drm/drm_cache.c > @@ -64,6 +64,20 @@ static void drm_cache_flush_clflush(struct page *pages[], > drm_clflush_page(*pages++); > mb(); > } > + > +static bool > +drm_cache_should_clflush(unsigned long num_pages) > +{ > + const int cache_size = boot_cpu_data.x86_cache_size; > + > + /* For now the algorithm simply checks if the number of pages to be > + * flushed is greater than the entire system cache. One could make the > + * function more aware of the actual system (ie. if SMP, how large is > + * the cache, CPU freq. etc. All those help to determine when to > + * wbinvd() */ > + WARN_ON_ONCE(!cache_size); > + return !cache_size || num_pages < (cache_size >> 2); > +} > #endif > > void > @@ -71,7 +85,7 @@ drm_clflush_pages(struct page *pages[], unsigned long num_pages) > { > > #if defined(CONFIG_X86) > - if (cpu_has_clflush) { > + if (cpu_has_clflush && drm_cache_should_clflush(num_pages)) { > drm_cache_flush_clflush(pages, num_pages); > return; > } > @@ -104,7 +118,7 @@ void > drm_clflush_sg(struct sg_table *st) > { > #if defined(CONFIG_X86) > - if (cpu_has_clflush) { > + if (cpu_has_clflush && drm_cache_should_clflush(st->nents)) { > struct sg_page_iter sg_iter; > > mb(); > @@ -128,7 +142,7 @@ void > drm_clflush_virt_range(void *addr, unsigned long length) > { > #if defined(CONFIG_X86) > - if (cpu_has_clflush) { > + if (cpu_has_clflush && drm_cache_should_clflush(length / PAGE_SIZE)) { If length isn't a multiple of page size, isn't this ignoring the remainder? Should it be rounding length up to the next multiple of PAGE_SIZE, like ROUND_UP_TO? _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel