Hi, Mike, Some minor nitpicks below. On Tue, May 24, 2022 at 04:28:44PM -0700, Mike Kravetz wrote: > Clarify that madvise only works on full pages, and remove references > to 'bytes'. > > Update MADV_DONTNEED and MADV_REMOVE sections to remove notes that > HugeTLB mappings are not supported. They now are supported. > > Under 'Linux notes' describe addr requirements and length handling > for ranges in HugeTLB mappings. > > Signed-off-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> > --- > man2/madvise.2 | 36 ++++++++++++++++++++++++++---------- > 1 file changed, 26 insertions(+), 10 deletions(-) > > diff --git a/man2/madvise.2 b/man2/madvise.2 > index f1f384c0c..c3b0615cb 100644 > --- a/man2/madvise.2 > +++ b/man2/madvise.2 > @@ -61,9 +61,13 @@ system call is used to give advice or directions to the kernel > about the address range beginning at address > .I addr > and with size > +.IR length . > +.BR madvise () > +only operates on whole pages, therefore > +.I addr > +must be page-aligned. The value of > .I length > -bytes > -In most cases, > +is rounded up to a multiple of page size. In most cases, > the goal of such advice is to improve system or application performance. > .PP > Initially, the system call supported a set of "conventional" > @@ -143,7 +147,7 @@ The resident set size (RSS) of the calling process will be immediately > reduced however. > .IP > .B MADV_DONTNEED > -cannot be applied to locked pages, Huge TLB pages, or > +cannot be applied to locked pages, or > .BR VM_PFNMAP > pages. This looks good, but since this will be a behavior change and we won't be able to change the old kernels, I saw the man page normally does this with things like: Since Linux 5.18, this madvise supports hugetlbfs pages. Majorly it states starting from which version it'll work, and when it'll not. > (Pages marked with the kernel-internal > @@ -170,24 +174,24 @@ Note that some of these operations change the semantics of memory accesses. > .\" commit f6b3ec238d12c8cc6cc71490c6e3127988460349 > Free up a given range of pages > and its associated backing store. > -This is equivalent to punching a hole in the corresponding byte > +This is equivalent to punching a hole in the corresponding > range of the backing store (see > .BR fallocate (2)). > Subsequent accesses in the specified address range will see > -bytes containing zero. > +pages containing zero. > .\" Databases want to use this feature to drop a section of their > .\" bufferpool (shared memory segments) - without writing back to > .\" disk/swap space. This feature is also useful for supporting > .\" hot-plug memory on UML. > .IP > The specified address range must be mapped shared and writable. > -This flag cannot be applied to locked pages, Huge TLB pages, or > +This flag cannot be applied to locked pages, or > .BR VM_PFNMAP > pages. > .IP > In the initial implementation, only > .BR tmpfs (5) > -was supported > +supported > .BR MADV_REMOVE ; > but since Linux 3.5, > .\" commit 3f31d07571eeea18a7d34db9af21d2285b807a17 > @@ -196,9 +200,9 @@ any filesystem which supports the > .BR FALLOC_FL_PUNCH_HOLE > mode also supports > .BR MADV_REMOVE . > -Hugetlbfs fails with the error > -.BR EINVAL > -and other filesystems fail with the error > +Filesystems which do not support > +.BR MADV_REMOVE > +fail with the error > .BR EOPNOTSUPP . > .TP > .BR MADV_DONTFORK " (since Linux 2.6.16)" > @@ -596,6 +600,18 @@ that are not mapped, the Linux version of > ignores them and applies the call to the rest (but returns > .B ENOMEM > from the system call, as it should). > +.PP > +If the specified address > +.I addr > +is within a mapping backed by Huge TLB pages, then > +.I addr > +must be aligned to the underlying Huge TLB page size. If the range > +specified by > +.I addr > +and > +.I length > +ends in a mapping backed by Huge TLB pages, then the end of the range > +will be rounded up to a multiple of the underlying Huge TLB page size. I'm slightly worried this could be hidden too deep, meanwhile it duplicates part of the sentence of how start/end will be treated. How about adding a short paragraph into each of MADV_DONTNEED and MADV_REMOVE section (right after the new sentences upon hugetlbfs), with: For hugetlbfs, the start/end alignments on page sizes will be based on huge page size. No strong opinions on any of these. Anyway: Acked-by: Peter Xu <peterx@xxxxxxxxxx> Thanks, -- Peter Xu