On Tue, Aug 29, 2017 at 04:09:24PM +0200, Andrea Arcangeli wrote: > Hello, > > On Tue, Aug 29, 2017 at 02:59:23PM +0200, Adam Borowski wrote: > > On Tue, Aug 29, 2017 at 02:45:41PM +0200, Takashi Iwai wrote: > > > [Put more people to Cc, sorry for growing too much...] > > > > We're all interested in 4.13.0 not crashing on us, so that's ok. > > > > > On Tue, 29 Aug 2017 11:19:13 +0200, > > > Bernhard Held wrote: > > > > > > > > On 08/28/2017 at 06:56 PM, Nadav Amit wrote: > > > > > Don’t blame me for the TLB stuff... My money is on aac2fea94f7a . > > > > > > > > Amit, thanks for your courage to expose your patch! > > > > > > > > I'm more and more confident that aac2fea94f7a is the culprit. Maybe it > > > > just accelerates the triggering of the splash. To be more sure the > > > > kernel needs to be tested for a couple of days. It would be great if > > > > others could assist in testing aac2fea94f7a. > > > > > > I'm testing with the revert for a while and it seems working. > > > > With nothing but aac2fea94f7a reverted, no explosions for me either. > > The aforementioned commit has 3 bugs. > > 1) mmu_notifier_invalidate_range cannot be used in replacement of > mmu_notifier_invalidate_range_start/end. For KVM > mmu_notifier_invalidate_range is a noop and rightfully so. A MMU > notifier implementation has to implement either > ->invalidate_range method or the invalidate_range_start/end > methods, not both. And if you implement invalidate_range_start/end > like KVM is forced to do, calling mmu_notifier_invalidate_range in > common code is a noop for KVM. > > For those MMU notifiers that can get away only implementing > ->invalidate_range, the ->invalidate_range is implicitly called by > mmu_notifier_invalidate_range_end(). And only those secondary MMUs > that share the same pagetable with the primary MMU (like AMD > iommuv2) can get away only implementing ->invalidate_range. > > So all cases (THP on/off) are broken right now. > > To fix this is enough to replace mmu_notifier_invalidate_range with > mmu_notifier_invalidate_range_start;mmu_notifier_invalidate_range_end. Either > that or call multiple mmu_notifier_invalidate_page like before. Kirill did regress invalidate_page as it use to be call outside the spinlock and now it is call inside the spinlock thus reverting will introduce back a regression. You can refer to the thread about it: https://lkml.org/lkml/2017/8/9/418 Jérôme