On Sat, Dec 13, 2014 at 08:15:22PM -0800, Matt Turner wrote: > On Sat, Dec 13, 2014 at 7:08 PM, Ben Widawsky > <benjamin.widawsky@xxxxxxxxx> wrote: > > Any GEM driver which has very large objects and a slow CPU is subject to very > > long waits simply for clflushing incoherent objects. Generally, each individual > > object is not a problem, but if you have very large objects, or very many > > objects, the flushing begins to show up in profiles. Because on x86 we know the > > cache size, we can easily determine when an object will use all the cache, and > > forego iterating over each cacheline. > > > > We need to be careful when using wbinvd. wbinvd() is itself potentially slow > > because it requires synchronizing the flush across all CPUs so they have a > > coherent view of memory. This can result in either stalling work being done on > > other CPUs, or this call itself stalling while waiting for a CPU to accept the > > interrupt. Also, wbinvd() also has the downside of invalidating all cachelines, > > so we don't want to use it unless we're sure we already own most of the > > cachelines. > > > > The current algorithm is very naive. I think it can be tweaked more, and it > > would be good if someone else gave it some thought. I am pretty confident in > > i915, we can even skip the IPI in the execbuf path with minimal code change (or > > perhaps just some verifying of the existing code). It would be nice to hear what > > other developers who depend on this code think. > > > > Cc: Intel GFX <intel-gfx@xxxxxxxxxxxxxxxxxxxxx> > > Signed-off-by: Ben Widawsky <ben@xxxxxxxxxxxx> > > --- > > drivers/gpu/drm/drm_cache.c | 20 +++++++++++++++++--- > > 1 file changed, 17 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c > > index d7797e8..6009c2d 100644 > > --- a/drivers/gpu/drm/drm_cache.c > > +++ b/drivers/gpu/drm/drm_cache.c > > @@ -64,6 +64,20 @@ static void drm_cache_flush_clflush(struct page *pages[], > > drm_clflush_page(*pages++); > > mb(); > > } > > + > > +static bool > > +drm_cache_should_clflush(unsigned long num_pages) > > +{ > > + const int cache_size = boot_cpu_data.x86_cache_size; > > + > > + /* For now the algorithm simply checks if the number of pages to be > > + * flushed is greater than the entire system cache. One could make the > > + * function more aware of the actual system (ie. if SMP, how large is > > + * the cache, CPU freq. etc. All those help to determine when to > > + * wbinvd() */ > > + WARN_ON_ONCE(!cache_size); > > + return !cache_size || num_pages < (cache_size >> 2); > > +} > > #endif > > > > void > > @@ -71,7 +85,7 @@ drm_clflush_pages(struct page *pages[], unsigned long num_pages) > > { > > > > #if defined(CONFIG_X86) > > - if (cpu_has_clflush) { > > + if (cpu_has_clflush && drm_cache_should_clflush(num_pages)) { > > drm_cache_flush_clflush(pages, num_pages); > > return; > > } > > @@ -104,7 +118,7 @@ void > > drm_clflush_sg(struct sg_table *st) > > { > > #if defined(CONFIG_X86) > > - if (cpu_has_clflush) { > > + if (cpu_has_clflush && drm_cache_should_clflush(st->nents)) { > > struct sg_page_iter sg_iter; > > > > mb(); > > @@ -128,7 +142,7 @@ void > > drm_clflush_virt_range(void *addr, unsigned long length) > > { > > #if defined(CONFIG_X86) > > - if (cpu_has_clflush) { > > + if (cpu_has_clflush && drm_cache_should_clflush(length / PAGE_SIZE)) { > > If length isn't a multiple of page size, isn't this ignoring the > remainder? Should it be rounding length up to the next multiple of > PAGE_SIZE, like ROUND_UP_TO? Yeah, we could round_up. In practice it probably won't matter. I actually think it would be better to pass a size to drm_cache_should_clflush(), and let that round it up. It sounds like people don't want this patch anyway, so I'll make the equivalent change in the i915 only patch. -- Ben Widawsky, Intel Open Source Technology Center _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel