On Tue, Feb 20, 2024 at 09:17:08PM -0400, Jason Gunthorpe wrote: > +/* > + * This generates a memcpy that works on a from/to address which is aligned to > + * bits. Count is in terms of the number of bits sized quantities to copy. It > + * optimizes to use the STR groupings when possible so that it is WC friendly. > + */ > +#define memcpy_toio_aligned(to, from, count, bits) \ > + ({ \ > + volatile u##bits __iomem *_to = to; \ > + const u##bits *_from = from; \ > + size_t _count = count; \ > + const u##bits *_end_from = _from + ALIGN_DOWN(_count, 8); \ > + \ > + for (; _from < _end_from; _from += 8, _to += 8) \ > + __const_memcpy_toio_aligned##bits(_to, _from, 8); \ > + if ((_count % 8) >= 4) { \ > + __const_memcpy_toio_aligned##bits(_to, _from, 4); \ > + _from += 4; \ > + _to += 4; \ > + } \ > + if ((_count % 4) >= 2) { \ > + __const_memcpy_toio_aligned##bits(_to, _from, 2); \ > + _from += 2; \ > + _to += 2; \ > + } \ > + if (_count % 2) \ > + __const_memcpy_toio_aligned##bits(_to, _from, 1); \ > + }) Do we actually need all this if count is not constant? If it's not performance critical anywhere, I'd rather copy the generic implementation, it's easier to read. Otherwise, apart from the __raw_writeq() typo that Will mentioned, the patch looks fine to me. -- Catalin