On Sat, 29 Dec 2018, Kirill A. Shutemov wrote: > > > > Smaps reveals the state of each vma at the moment it is read, not the > > > > future, this has always been the case before and after the broken commit; > > > > it's not intended to be something that can specify "thp is currently > > > > disabled, but if we did prctl(PR_SET_THP_DISABLE, 0), this vma has done > > > > madvise(MADV_HUGEPAGE), and we are on a thp enabled == madvise host, so > > > > it would be eligible for thp." > > > > > > The point of CRIU is that it tries to save and re-create the state of the > > > process. > > > > > > With your fix, "nh" in VMA flags can mean two different things: > > > - MADV_NOHUGEPAGE was called for the VMA, or > > > - prctl(PR_SET_THP_DISABLE) was called for the whole process > > > > > > Current CRIU code assumes that "nh" means MADV_NOHUGEPAGE. But if it was > > > due to prctl(PR_SET_THP_DISABLE) the process state will not be restored > > > correctly: all VMAs will be restored with MADV_NOHUGEPAGE and future > > > re-enabling THP with prctl(PR_SET_THP_DISABLE) for restored process will > > > not have any effect. VMAs stay not eligble for THP. > > > > > > > CRIU's usage of "nh" was created specifically for PR_SET_THP_DISABLE: > > > > commit 9f61de87768b33d9b187ef71a57262a17276ff3b > > > > proc_parse: make smaps parsing helpers globally available > > > > The is_vma_range_fmt and parse_vmflags will be required for detection of > > availability of PR_SET_THP_DISABLE prctl > > > > Acked-by: Pavel Emelyanov <xemul@xxxxxxxxxxxxx> > > Signed-off-by: Mike Rapoport <rppt@xxxxxxxxxxxxxxxxxx> > > Signed-off-by: Andrei Vagin <avagin@xxxxxxxxxxxxx> > > > > > > That's at > > https://github.com/checkpoint-restore/criu/commit/9f61de87768b33d9b187ef71a57262a17276ff3b. > > > > They are using it for the *exact* same reason that we are. That commit > > was merged June 2017 based on the kernel code as it existed prior to the > > broken commit in July 2017. > > > > > > Does this not suggest that CRIU is now equally as broken as my userspace? > > I think it *was* broken in the same way as your userspace. And it was > adjusted to the updated kernel by 5480a2a40683. > Ah, ok, I see this in criu: commit 5480a2a4068349c911fd78b9a45850b4b8a5596e Author: Mike Rapoport <rppt@xxxxxxxxxxxxxxxxxx> Date: Wed Jun 28 09:43:25 2017 +0300 kerndat: add test for availability of PR_SET_THP_DISABLE prctl The PR_SET_THP_DISABLE prctl allows control of transparent huge pages on per-process basis. It is available since Linux 3.15, but until recently it set VM_NOHUGEPAGE for all VMAs created after prctl() call, which prevents proper restore for combination of pre- and post-copy. A recent change to prctl(PR_SET_THP_DISABLE) behavior eliminates the use of per-VMA flags and we can use the new version of the prctl() to disable THP. Acked-by: Pavel Emelyanov <xemul@xxxxxxxxxxxxx> Signed-off-by: Mike Rapoport <rppt@xxxxxxxxxxxxxxxxxx> Signed-off-by: Andrei Vagin <avagin@xxxxxxxxxxxxx> So commit 1860033237d4 ("mm: make PR_SET_THP_DISABLE immediately active") broke our usecase as well as criu. I'm not sure of the implications of the fix above, it appears that it tests if PR_SET_THP_DISABLE is available and whether or not we can parse VmFlags from smaps to detect that. I assume that criu still works on previous kernels such that if this patch were merged that it wouldn't be broken. So we have a case where multiple users have been broken by the commit, is there an objection to merging this so that we always get "nh" in VmFlags when PR_SET_THP_DISABLE is used since this was the only way it was detectable on earlier kernels?