On Thu, Sep 13, 2018 at 12:30:14PM +0200, Martin Schwidefsky wrote: > > + * The mmu_gather data structure is used by the mm code to implement the > > + * correct and efficient ordering of freeing pages and TLB invalidations. > > + * > > + * This correct ordering is: > > + * > > + * 1) unhook page > > + * 2) TLB invalidate page > > + * 3) free page > > + * > > + * That is, we must never free a page before we have ensured there are no live > > + * translations left to it. Otherwise it might be possible to observe (or > > + * worse, change) the page content after it has been reused. > > + * > > This first comment already includes the reason why s390 is probably better off > with its own mmu-gather implementation. It depends on the situation if we have > > 1) unhook the page and do a TLB flush at the same time > 2) free page > > or > > 1) unhook page > 2) free page > 3) final TLB flush of the whole mm that's the fullmm case, right? > A variant of the second order we had in the past is to do the mm TLB flush first, > then the unhooks and frees of the individual pages. The are some tricky corners > switching between the two variants, see finish_arch_post_lock_switch. > > The point is: we *never* have the order 1) unhook, 2) TLB invalidate, 3) free. > If there is concurrency due to a multi-threaded application we have to do the > unhook of the page-table entry and the TLB flush with a single instruction. You can still get the thing you want if for !fullmm you have a no-op tlb_flush() implementation, assuming your arch page-table frobbing thing has the required TLB flush in. Note that that's not utterly unlike how the PowerPC/Sparc hash things work, they clear and invalidate entries different from others and don't use the mmu_gather tlb-flush.