On Fri, Dec 28, 2018 at 11:54:17AM +0100, Vlastimil Babka wrote: > On 12/28/18 9:18 AM, Michal Hocko wrote: > > On Thu 27-12-18 21:31:00, Andrew Morton wrote: > >> On Thu, 27 Dec 2018 14:11:14 +0300 "Kirill A. Shutemov" <kirill@xxxxxxxxxxxxx> wrote: > >> > >>> On Mon, Dec 24, 2018 at 10:17:31AM +0100, Michal Hocko wrote: > >>>> On Mon 24-12-18 01:05:57, David Rientjes wrote: > >>>> [...] > >>>>> I'm not interested in having a 100 email thread about this when a clear > >>>>> and simple fix exists that actually doesn't break user code. > >>>> > >>>> You are breaking everybody who really wants to query MADV_NOHUGEPAGE > >>>> status by this flag. Is there anybody doing that? > >>> > >>> Yes. > >>> > >>> https://github.com/checkpoint-restore/criu/blob/v3.11/criu/proc_parse.c#L143 > >> > >> Ugh. So the regression fix causes a regression? > > > > Yes. The patch from David will hardcode the nohugepage vm flag if the > > THP was disabled by the prctl at the time of the snapshot. And if the > > application later enables THP by the prctl then existing mappings would > > never get the THP enabled status back. > > > > This is the kind of a potential regression I was poiting out earlier > > when explaining that the patch encodes the logic into the flag exporting > > and that means that whoever wants to get the raw value of the flag will > > not be able to do so. Please note that the raw value is exactly what > > this interface is documented and supposed to export. And as the > > documentation explains it is implementation specific and anybody to use > > it should be careful. > > Let's add some CRIU guys in the loop (dunno if the right ones). We're > discussing David's patch [1] that makes 'nh' and 'hg' flags reported in > /proc/pid/smaps (and set by madvise) overridable by > prctl(PR_SET_THP_DISABLE). This was sort of accidental behavior (but > only for mappings created after the prctl call) before 4.13 commit > 1860033237d4 ("mm: make PR_SET_THP_DISABLE immediately active"). > > For David's userspace that commit is a regression as there are false > positives when checking for vma's that are eligible for THP (=don't have > the 'nh' flag in smaps) but didn't really obtain THP's. The userspace > assumes it's due to fragmentation (=bad) and cannot know that it's due > to the prctl(). But we fear that making prctl() affect smaps vma flags > means that CRIU can't query them accurately anymore, and thus it's a > regression for CRIU. Can you comment on that? CRIU parses VmFlags from /proc/pid/smaps during the checkpoint and then uses their raw values to recreate exactly the same mappings during restore. We use appropriate madvise() calls to re-establish VMA flags. Implicitly setting the 'nh' flag in /proc/pid/smaps when THP usage was restricted with prctl() will make CRIU to call madvise(MADV_NOHUPEPAGE) for all the mappings with 'nh' set as CRIU cannot detect the actual reason for VMA being marked as VM_NOHUGEPAGE. As Andrew pointed out, if the restored process will re-enable THP with prtcl(), the VMAs will retail VM_NOHUGEPAGE flag. And, I don't see how can we work around this in CRIU. > Michal has a patch [2] that reports the prctl() status separately, but > that doesn't help David's existing userspace. For CRIU this also won't > help as long the smaps vma flags still silently included the prctl() > status. Do you see some solution that would work for everybody? Nothing comes to mind at the moment, maybe next year ;-) > [1] > https://www.ozlabs.org/~akpm/mmots/broken-out/mm-thp-always-specify-disabled-vmas-as-nh-in-smaps.patch > [2] > https://www.ozlabs.org/~akpm/mmots/broken-out/mm-proc-report-pr_set_thp_disable-in-proc.patch > -- Sincerely yours, Mike.