Hi Mikulas, On Tue, 2018-06-05 at 11:30 -0400, Mikulas Patocka wrote: > > On Tue, 5 Jun 2018, Alexey Brodkin wrote: > > > Hi Mikulas, > > > > On Sun, 2018-06-03 at 16:41 +0200, Mikulas Patocka wrote: > > > Modern processors can detect linear memory accesses and prefetch data > > > automatically, so there's no need to use prefetch. > > > > Not each and every CPU that's capable of running Linux has prefetch > > functionality :) > > > > Still read-on... > > > > > Signed-off-by: Mikulas Patocka <mpatocka@xxxxxxxxxx> > > > > > > --- > > > drivers/gpu/drm/udl/udl_transfer.c | 7 ------- > > > 1 file changed, 7 deletions(-) > > > > > > Index: linux-4.16.12/drivers/gpu/drm/udl/udl_transfer.c > > > =================================================================== > > > --- linux-4.16.12.orig/drivers/gpu/drm/udl/udl_transfer.c 2018-05-31 14:48:12.000000000 +0200 > > > +++ linux-4.16.12/drivers/gpu/drm/udl/udl_transfer.c 2018-05-31 14:48:12.000000000 +0200 > > > @@ -13,7 +13,6 @@ > > > #include <linux/module.h> > > > #include <linux/slab.h> > > > #include <linux/fb.h> > > > -#include <linux/prefetch.h> > > > #include <asm/unaligned.h> > > > > > > #include <drm/drmP.h> > > > @@ -51,9 +50,6 @@ static int udl_trim_hline(const u8 *bbac > > > int start = width; > > > int end = width; > > > > > > - prefetch((void *) front); > > > - prefetch((void *) back); > > > > AFAIK prefetcher fetches new data according to a known history... i.e. based on previously > > used pattern we'll trying to get the next batch of data. > > > > But the code above is in the very beginning of the data processing routine where > > prefetcher doesn't yet have any history to know what and where to prefetch. > > > > So I'd say this particular usage is good. > > At least those prefetches shouldn't hurt because typically it > > would be just 1 instruction if those exist or nothing if CPU/compiler doesn't > > support it. > > See this post https://urldefense.proofpoint.com/v2/url?u=https-3A__lwn.net_Articles_444336_&d=DwIBAg&c=DPL6_X_6JkXFx7AXWqB0tg&r=lqdeeSSEes0GFDDl656e > ViXO7breS55ytWkhpk5R81I&m=a5RaqJtvajFkM1hL7bOKD5jV7cpFfTvG2Y1cYCdBPd0&s=w0W8wFtAgENp8TE6RzdPGhdKRasJc_otIn08V0EkgrY&e= where they measured that > prefetch hurts performance. Prefetch shouldn't be used unless you have a > proof that it improves performance. > > The problem is that the prefetch instruction causes stalls in the pipeline > when it encounters TLB miss and the automatic prefetcher doesn't. Wow, thanks for the link. I didn't know about that subtle issue with prefetch instructions on ARM and x86. So OK in case of UDL these prefetches anyways make not not much sense I guess and there's something worse still, see what I've got from WandBoard Quad running kmscube [1] application with help of perf utility: --------------------------->8------------------------- # Overhead Command Shared Object Symbol # ........ ....... ....................... ........................................ # 92.93% kmscube [kernel.kallsyms] [k] udl_render_hline 2.51% kmscube [kernel.kallsyms] [k] __divsi3 0.33% kmscube [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore 0.22% kmscube [kernel.kallsyms] [k] lock_acquire 0.19% kmscube [kernel.kallsyms] [k] _raw_spin_unlock_irq 0.17% kmscube [kernel.kallsyms] [k] udl_handle_damage 0.12% kmscube [kernel.kallsyms] [k] v7_dma_clean_range 0.11% kmscube [kernel.kallsyms] [k] l2c210_clean_range 0.06% kmscube [kernel.kallsyms] [k] __memzero --------------------------->8------------------------- That said it's not even USB 2.0 which is a bottle-neck but computations in the udl_render_hline(). [1] https://cgit.freedesktop.org/mesa/kmscube/ -Alexey _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel