Hi Kirill, >>>> The crash scenario I guess is like: >>>> 1. A huge page pmd entry is in the middle of being changed into either a >>>> pmd_protnone or a pmd_migration_entry. It is cleared to pmd_none. >>>> >>>> 2. At the same time, the application frees the vma this page belongs to. >>> >>> Em... no. >>> >>> This shouldn't be possible: your 1. must be done under down_read(mmap_sem). >>> And we only be able to remove vma under down_write(mmap_sem), so the >>> scenario should be excluded. >>> >>> What do I miss? >> >> You are right. This problem will not happen in the upstream kernel. >> >> The problem comes from my customized kernel, where I migrate pages away >> instead of reclaiming them when memory is under pressure. I did not take >> any mmap_sem when I migrate pages. So I got this error. >> >> It is a false alarm. Sorry about that. Thanks for clarifying the problem. > > I think there's still a race between MADV_DONTNEED and > change_huge_pmd(.prot_numa=1) resulting in skipping THP by > zap_pmd_range(). It need to be addressed. > > And MADV_FREE requires a fix. > > So, minus one non-bug, plus two bugs. > You said a huge page pmd entry needs to be changed under down_read(mmap_sem). It is only true for huge pages, right? Since in mm/compaction.c, the kernel does not down_read(mmap_sem) during memory compaction. Namely, base page migrations do not hold down_read(mmap_sem), so in zap_pte_range(), the kernel needs to hold PTE page table locks. Am I right about this? If yes. IMHO, ultimately, when we need to compact 2MB pages to form 1GB pages, in zap_pmd_range(), pmd locks have to be taken to make that kind of compactions possible. Do you agree? -- Best Regards Yan Zi
Attachment:
signature.asc
Description: OpenPGP digital signature