On Thu, Feb 29, 2024 at 10:33:04AM +0000, Catalin Marinas wrote: > On Tue, Feb 20, 2024 at 09:17:08PM -0400, Jason Gunthorpe wrote: > > + const u32 *from, size_t count) > > +{ > > + switch (count) { > > + case 8: > > + asm volatile("str %w0, [%8, #4 * 0]\n" > > + "str %w1, [%8, #4 * 1]\n" > > + "str %w2, [%8, #4 * 2]\n" > > + "str %w3, [%8, #4 * 3]\n" > > + "str %w4, [%8, #4 * 4]\n" > > + "str %w5, [%8, #4 * 5]\n" > > + "str %w6, [%8, #4 * 6]\n" > > + "str %w7, [%8, #4 * 7]\n" > > + : > > + : "rZ"(from[0]), "rZ"(from[1]), "rZ"(from[2]), > > + "rZ"(from[3]), "rZ"(from[4]), "rZ"(from[5]), > > + "rZ"(from[6]), "rZ"(from[7]), "r"(to)); > > + break; > > BTW, talking of maintenance, would a series of __raw_writel() with > Mark's recent patch for offset addressing generate similar code? I.e.: No gcc intersperses reads/writes (which we were advised not to do) and clang doesn't support the "o" directive so it produces poor codegen. Jason