On Tue, May 30, 2017 at 04:56:33PM +0200, Michal Hocko wrote: > On Tue 30-05-17 16:39:41, Michal Hocko wrote: > > On Tue 30-05-17 16:04:56, Andrea Arcangeli wrote: > [...] > > > About the proposed madvise, it just clear bits, but it doesn't change > > > at all how those bits are computed in THP code. So I don't see it as > > > convoluted. > > > > But we already have MADV_HUGEPAGE, MADV_NOHUGEPAGE and prctl to > > enable/disable thp. Doesn't that sound little bit too much for a single > > feature to you? > > And also I would argue that the prctl should be usable for this specific > usecase. The man page says > " > Setting this flag provides a method for disabling transparent huge pages > for jobs where the code cannot be modified > " > > and that fits into the described case AFAIU. The thing that the current > implementation doesn't work is a mere detail. I would even argue that > it is non-intuitive if not buggy right away. Whoever calls this prctl > later in the process life time will simply not stop THP from creating. > > So again, why cannot we fix that? There was some handwaving about > potential overhead but has anybody actually measured that? I'm not sure if it should be considered a bug, the prctl is intended to use normally by wrappers so it looks optimal as implemented this way: affecting future vmas only, which will all be created after execve executed by the wrapper. What's the point of messing with the prctl so it mangles over the wrapper process own vmas before exec? Messing with those vmas is pure wasted CPUs for the wrapper use case which is what the prctl was created for. Furthermore there would be the risk a program that uses the prctl not as a wrapper and then calls the prctl to clear VM_NOHUGEPAGE from def_flags assuming the current kABI. The program could assume those vmas that were instantiated before disabling the prctl are still with VM_NOHUGEPAGE set (they would not after the change you propose). Adding a scan of all vmas to PR_SET_THP_DISABLE to clear VM_NOHUGEPAGE on existing vmas looks more complex too and less finegrined so probably more complex for userland to manage, but ignoring all above considerations it would be a functional alternative for CRIU's needs. However if you didn't like the complexity of the new madvise which is functionally a one-liner equivalent to MADV_NORMAL, I wouldn't expect you to prefer to make the prctl even more complex with a loop over all vmas that despite being fairly simple it'll still be more than a trivial one liner. Thanks, Andrea -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html