On Wed, Jan 15, 2020 at 08:52:45PM +0000, Chris Wilson wrote: > Since we may try and flush the cachelines associated with large buffers > (an 8K framebuffer is about 128MiB, even before we try HDR), this leads > to unacceptably long latencies (when using a voluntary CONFIG_PREEMPT). > If we call cond_resched() between each sg chunk, that it about every 128 > pages, we have a natural break point in which to check if the process > needs to be rescheduled. Naturally, this means that drm_clflush_sg() can > only be called from process context -- which is true at the moment. The > other clflush routines remain usable from atomic context. > > Even though flushing large objects takes a demonstrable amount to time > to flush all the cachelines, clflush is still preferred over a > system-wide wbinvd as the latter has unpredictable latencies affecting > the whole system not just the local task. > > Reported-by: David Laight <David.Laight@xxxxxxxxxx> > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Cc: David Laight <David.Laight@xxxxxxxxxx> The original bug report is complaining about latencies for SCHED_RT threads, on a system that doesn't even use CONFIG_PREEMPT. I'm not sure it's terribly valid to cater to that use-case - all the desktop distros seem a lot more reasonable. So firmly *shrug* from my side ... Patch itself looks correct, just not seeing the point. -Daniel > --- > drivers/gpu/drm/drm_cache.c | 49 ++++++++++++++++++++++++++++++++++--- > 1 file changed, 45 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c > index 03e01b000f7a..fbd2bb644544 100644 > --- a/drivers/gpu/drm/drm_cache.c > +++ b/drivers/gpu/drm/drm_cache.c > @@ -112,23 +112,64 @@ drm_clflush_pages(struct page *pages[], unsigned long num_pages) > } > EXPORT_SYMBOL(drm_clflush_pages); > > +static __always_inline struct sgt_iter { > + struct scatterlist *sgp; > + unsigned long pfn; > + unsigned int curr; > + unsigned int max; > +} __sgt_iter(struct scatterlist *sgl) { > + struct sgt_iter s = { .sgp = sgl }; > + > + if (s.sgp) { > + s.max = s.curr = s.sgp->offset; > + s.max += s.sgp->length; > + s.pfn = page_to_pfn(sg_page(s.sgp)); > + } > + > + return s; > +} > + > +static inline struct scatterlist *__sg_next_resched(struct scatterlist *sg) > +{ > + if (sg_is_last(sg)) > + return NULL; > + > + ++sg; > + if (unlikely(sg_is_chain(sg))) { > + sg = sg_chain_ptr(sg); > + cond_resched(); > + } > + return sg; > +} > + > +#define for_each_sgt_page(__pp, __iter, __sgt) \ > + for ((__iter) = __sgt_iter((__sgt)->sgl); \ > + ((__pp) = (__iter).pfn == 0 ? NULL : \ > + pfn_to_page((__iter).pfn + ((__iter).curr >> PAGE_SHIFT))); \ > + (((__iter).curr += PAGE_SIZE) >= (__iter).max) ? \ > + (__iter) = __sgt_iter(__sg_next_resched((__iter).sgp)), 0 : 0) > + > /** > * drm_clflush_sg - Flush dcache lines pointing to a scather-gather. > * @st: struct sg_table. > * > * Flush every data cache line entry that points to an address in the > - * sg. > + * sg. This may schedule between scatterlist chunks, in order to keep > + * the system preemption-latency down for large buffers. > */ > void > drm_clflush_sg(struct sg_table *st) > { > + might_sleep(); > + > #if defined(CONFIG_X86) > if (static_cpu_has(X86_FEATURE_CLFLUSH)) { > - struct sg_page_iter sg_iter; > + struct sgt_iter sg_iter; > + struct page *page; > > mb(); /*CLFLUSH is ordered only by using memory barriers*/ > - for_each_sg_page(st->sgl, &sg_iter, st->nents, 0) > - drm_clflush_page(sg_page_iter_page(&sg_iter)); > + for_each_sgt_page(page, sg_iter, st) > + drm_clflush_page(page); > mb(); /*Make sure that all cache line entry is flushed*/ > > return; > -- > 2.25.0 > > _______________________________________________ > dri-devel mailing list > dri-devel@xxxxxxxxxxxxxxxxxxxxx > https://lists.freedesktop.org/mailman/listinfo/dri-devel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx