On Tue, Jun 7, 2022 at 1:37 AM Barry Song <21cnbao@xxxxxxxxx> wrote: > > On Mon, Jun 6, 2022 at 9:25 PM Barry Song <21cnbao@xxxxxxxxx> wrote: > > > > On Wed, May 18, 2022 at 4:49 PM Yu Zhao <yuzhao@xxxxxxxxxx> wrote: ... > I can't really explain why we are getting a random app/java vm crash in monkey > test by using ptep_test_and_clear_young() only in lru_gen_look_around() on an > armv8-a machine without hardware PTE young support. > > Moving to ptep_clear_flush_young() in look_around can make the random > hang disappear according to zhanyuan(Cc-ed). This sounds too familiar -- let me ask again: was the following commit included during the test? 07509e10dcc7 arm64: pgtable: Fix pte_accessible() If not, it will cause exactly the problem you described. And what about this one? e914d8f00391 mm: fix unexpected zeroed page mapping with zram swap Missing it also causes userspace memory corruption on Android, i.e., random app crashes. > On x86, ptep_clear_flush_young() is exactly ptep_test_and_clear_young() > after > 'commit b13b1d2d8692 ("x86/mm: In the PTE swapout page reclaim case clear > the accessed bit instead of flushing the TLB")' > > But on arm64, they are different. according to Will's comments in this > thread which > tried to make arm64 same with x86, > https://www.mail-archive.com/linux-kernel@xxxxxxxxxxxxxxx/msg1793881.html > > " > This is blindly copied from x86 and isn't true for us: we don't invalidate > the TLB on context switch. That means our window for keeping the stale > entries around is potentially much bigger and might not be a great idea. > > If we roll a TLB invalidation routine without the trailing DSB, what sort of > performance does that get you? > " > We shouldn't think ptep_clear_flush_young() is safe enough in LRU to > clear PTE young? Any comments from Will? > > > > > btw, lru_gen_look_around() has already included 'address', are we doing > > pte check for 'address' twice here? Explained in the previous reply. Hope that clarifies things.