On Tue, Aug 16, 2016 at 08:04:35PM +0100, Matthew Auld wrote: > > + if (dst_needs_clflush & CLFLUSH_BEFORE) > > + batch_len = roundup(batch_len, boot_cpu_data.x86_clflush_size); > hmm, this bit doesn't seem obvious to me. What am I missing? The code is optimized to work on cachelines (to work on a partial cacheline requires a flush before the read). We know that the batch is in whole pages and so we can always round up to the end of the next cacheline. We also know that we can read more than the declared length of the batch into the local page as we only validate as much as required. So it is safe to read from beyond the end of the batch, and to do so avoids having to insert clflushes before the read. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx