On Wed, Mar 11, 2020 at 02:32:41PM -0400, Arvind Sankar wrote: > On Wed, Mar 11, 2020 at 11:16:07AM +0300, Kirill A. Shutemov wrote: > > On Tue, Mar 10, 2020 at 11:35:54PM -0400, Arvind Sankar wrote: > > > > > > The rationale for MOVNTI instruction is supposed to be that it avoids > > > cache pollution. Aside from the bench that shows MOVNTI to be faster for > > > the move itself, shouldn't it have an additional benefit in not trashing > > > the CPU caches? > > > > > > As string instructions improve, why wouldn't the same improvements be > > > applied to MOVNTI? > > > > String instructions inherently more flexible. Implementation can choose > > caching strategy depending on the operation size (cx) and other factors. > > Like if operation is large enough and cache is full of dirty cache lines > > that expensive to free up, it can choose to bypass cache. MOVNTI is more > > strict on semantics and more opaque to CPU. > > But with today's processors, wouldn't writing 1G via the string > operations empty out almost the whole cache? Or are there already > optimizations to prevent one thread from hogging the L3? Also, currently the stringop is only done 4k at a time, so it would likely not trigger any future cache-bypassing optimizations in any case. > > If we do want to just use the string operations, it seems like the > clear_page routines should just call memset instead of duplicating it. > > > > > And more importantly string instructions, unlike MOVNTI, is something that > > generated often by compiler and used in standard libraries a lot. It is > > and will be focus of optimization of CPU architects. > > > > -- > > Kirill A. Shutemov