On Thu, Oct 6, 2016 at 10:39 PM, Doug Dumitru <doug@xxxxxxxxxx> wrote: > > There is another thread in [linux-raid] discussing pre-fetches in the > raid-6 AVX2 code. My testing implies that the prefetch distance is > too short. In your new AVX512 code, it looks like there are 24 > instructions, each with latencies of 1, between the prefetch and the > actual memory load. I don't have a AVX512 CPU to try this on, but the > prefetch might do better at a bigger distance. If I am not mistaken, > it takes a lot longer than 24 clocks to fetch 4 cache lines. We have basically never had a case where prefetches were actually a good idea. If the hardware doesn't do prefetching on its own (partly with just physical memory patterns in the memory controller, partly just with aggressive OoO), software isn't going to be able to improve on the situation in general. SW prefetching is a broken concept. You can make big differences for very specific microarchitectures (usually the broken shit ones are the ones that show it best), but in the general case it's pretty much always a lost cause. We've had real cases where prefetching just then made things worse on other hardware. So just don't do it. It's benchmarketing for specific hardware, it's not worth worrying about in the bigger picture. You'll find people spend a lot of time tuning things for their particular hardware, and it not helping at all on anything else. Waste of time. Life is too short (and software is too complex) to try to work around broken microarchitectures with sw prefetching. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html