> On Tue 15-08-17 19:06:28, Tetsuo Handa wrote: > > Michal Hocko wrote: > > > On Tue 15-08-17 07:51:02, Tetsuo Handa wrote: > > > > Michal Hocko wrote: > > > [...] > > > > > Were you able to reproduce with other filesystems? > > > > > > > > Yes, I can reproduce this problem using both xfs and ext4 on 4.11.11-200.fc25.x86_64 > > > > on Oracle VM VirtualBox on Windows. > > > > > > Just a quick question. > > > http://lkml.kernel.org/r/201708112053.FIG52141.tHJSOQFLOFMFOV@xxxxxxxxxxxxxxxxxxx > > > mentioned next-20170811 kernel and this one 4.11. Your original report > > > as a reply to this thread > > > http://lkml.kernel.org/r/201708072228.FAJ09347.tOOVOFFQJSHMFL@xxxxxxxxxxxxxxxxxxx > > > mentioned next-20170728. None of them seem to have this fix > > > http://lkml.kernel.org/r/20170807113839.16695-3-mhocko@xxxxxxxxxx so let > > > me ask again. Have you seen an unexpected content written with that > > > patch applied? > > > > No. All non-zero non-0xFF values are without that patch applied. > > I want to confirm that that patch actually fixes non-zero non-0xFF values > > (so that we can have better patch description for that patch). > > OK, so I have clearly misunderstood you. I thought that you can still > see corrupted content with the patch _applied_. Now I see why I couldn't > reproduce this... If I apply this patch, I can no longer reproduce this problem even with btrfs. -+ * and could cause a memory corruption (zero pages instead of the -+ * original content). ++ * and could cause a memory corruption (random content instead of the ++ * original content). Tested-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> > > Now I also understand what you meant when asking for an explanation. I > can only speculate how we could end up with the non-zero page previously > but the closest match would be that the page got unmapped and reused by > a different path and a stalled tlb entry would leak the content. Such a > thing would happen if we freed the page _before_ we flushed the tlb > during unmap. > > Considering that oom_reaper is relying on unmap_page_range which seems > to be doing the right thing wrt. flushing vs. freeing ordering (enforced > by the tlb_gather) I am wondering what else could go wrong but I vaguely > remember there were some races between THP and MADV_DONTNEED in the > past. Maybe we have hit an incarnation of something like that. Anyway > the oom_reaper doesn't try to be clever and it only calls to > unmap_page_range which should be safe from that context. > > The primary bug here was that we allowed to refault an unmmaped memory > and that should be fixed by the patch AFAICS. If there are more issues > we should definitely track those down but those should be oom_reaper > independent because we really do not do anything special here. > > -- > Michal Hocko > SUSE Labs > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>