On 12/01/2017 12:56 PM, Jens Axboe wrote: > On 12/01/2017 11:56 AM, Rebecca Cran wrote: >> >>> On Dec 1, 2017, at 11:20 AM, Jens Axboe <axboe@xxxxxxxxx> wrote: >>> >>> which is kind of depressing, since the fastest for larger sizes is the >>> very dumb and basic implementation that you'll find in any text book >>> under the section of "my first memcpy". >>> >>> Anyway, for evaluating implementations, we need a way to test them, >>> and now we have. I'll be happy to take input/patches on the test >>> itself. >> >> Thanks - I meant to reply a few days ago and tell you I will work on a >> patch for this. >> >> For the simple case, does the compiler do anything interesting? For >> example, auto-vectorization should be simple for it to do if it knows >> the capabilities of the target machine.> > Doesn't look like it - it just unrolls it a bit, and then uses movzbl. > So nothing exciting at all. Ah hang on, there's more to it. There are unrolled bits for the unaligned length/sizes, and then it does the majority of the work with movdqa and movups. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe fio" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html