Re: [PATCH v2] madvise.2: Document MADV_POPULATE_READ and MADV_POPULATE_WRITE

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello David,

Thank you for writing this! Could you please take
a look at the comments below and revise?

On 8/16/21 10:19 AM, David Hildenbrand wrote:
> MADV_POPULATE_READ and MADV_POPULATE_WRITE have been merged into
> upstream Linux via commit 4ca9b3859dac ("mm/madvise: introduce
> MADV_POPULATE_(READ|WRITE) to prefault page tables"), part of v5.14-rc1.
> 
> Further, commit eb2faa513c24 ("mm/madvise: report SIGBUS as -EFAULT for
> MADV_POPULATE_(READ|WRITE)"), part of v5.14-rc6, made sure that SIGBUS is
> converted to -EFAULT instead of -EINVAL.
> 
> Let's document the behavior and error conditions of these new madvise()
> options.
> 
> Acked-by: Pankaj Gupta <pankaj.gupta@xxxxxxxxx>
> Cc: Alejandro Colomar <alx.manpages@xxxxxxxxx>
> Cc: Michael Kerrisk <mtk.manpages@xxxxxxxxx>
> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Cc: Michal Hocko <mhocko@xxxxxxxx>
> Cc: Oscar Salvador <osalvador@xxxxxxx>
> Cc: Jann Horn <jannh@xxxxxxxxxx>
> Cc: Mike Rapoport <rppt@xxxxxxxxxx>
> Cc: Linux API <linux-api@xxxxxxxxxxxxxxx>
> Cc: linux-mm@xxxxxxxxx
> Signed-off-by: David Hildenbrand <david@xxxxxxxxxx>
> ---
> 
> v1 -> v2:
> - Use semantic newlines in all cases
> - Add two missing "
> - Document -EFAULT handling
> - Rephrase some parts to make it more generic: VM_PFNMAP and VM_IO are only
>   examples for special mappings
> 
> ---
>  man2/madvise.2 | 107 +++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 107 insertions(+)
> 
> diff --git a/man2/madvise.2 b/man2/madvise.2
> index f1f384c0c..f6cea9ad2 100644
> --- a/man2/madvise.2
> +++ b/man2/madvise.2
> @@ -469,6 +469,72 @@ If a page is file-backed and dirty, it will be written back to the backing
>  storage.
>  The advice might be ignored for some pages in the range when it is not
>  applicable.
> +.TP
> +.BR MADV_POPULATE_READ " (since Linux 5.14)"
> +Populate (prefault) page tables readable for the whole range without actually

I have trouble to understand "Populate (prefault) page tables readable".
Does it mean that it is just the page tables are being populated, and the
PTEs are marked to indicate that the pages are readable? If yes, I
think some rewording would help.

> +reading memory.

I don't understand "without actually reading memory"? Do you mean,
"without actually  faulting in the pages"; or something else?

> +Depending on the underlying mapping,
> +map the shared zeropage,
> +preallocate memory or read the underlying file;
> +files with holes might or might not preallocate blocks.
> +Do not generate
> +.B SIGBUS
> +when populating fails,
> +return an error instead.

Better:

[[
If populating fails, a
.B SIGBUS
signal is not generated; instead, an error i returned.
]]

> +.IP
> +If
> +.B MADV_POPULATE_READ
> +succeeds,
> +all page tables have been populated (prefaulted) readable once.
> +If
> +.B MADV_POPULATE_READ
> +fails,
> +some page tables might have been populated.
> +.IP
> +.B MADV_POPULATE_READ
> +cannot be applied to mappings without read permissions
> +and special mappings,
> +for example,
> +marked with the kernel-internal

s/marked/mappings marked/

> +.B VM_PFNMAP
> +and

Just checking: should it be "and" or "or" here"?

Looking at the EINVAL error below, I guess "or", and a better 
wording would be:

[[
...for example, mappings marked with kernel-internal flags such as
.B VMPPFNMAP
or
.BR BR_V_IO.
]]

> +.BR VM_IO .
> +.IP
> +Note that with
> +.BR MADV_POPULATE_READ ,
> +the process can be killed at any moment when the system runs out of memory.
> +.TP
> +.BR MADV_POPULATE_WRITE " (since Linux 5.14)"
> +Populate (prefault) page tables writable for the whole range without actually

I have trouble to understand "Populate (prefault) page tables writable".
Does it mean that it is just the page tables are being populated, and the
PTEs are marked to indicate that the pages are writable? If yes, I
think some rewording would help.

> +writing memory.

I don't understand "without actually writing memory"? Do you mean,
"without actually  faulting in the pages"; or something else?

> +Depending on the underlying mapping,
> +preallocate memory or read the underlying file;
> +files with holes will preallocate blocks.
> +Do not generate
> +.B SIGBUS
> +when populating fails,
> +return an error instead.

Better:

[[
If populating fails, a
.B SIGBUS
signal is not generated; instead, an error i returned.
]]

+.IP
> +If
> +.B MADV_POPULATE_WRITE
> +succeeds,
> +all page tables have been populated (prefaulted) writable once.
> +If
> +.B MADV_POPULATE_WRITE
> +fails, some page tables might have been populated.
> +.IP
> +.B MADV_POPULATE_WRITE
> +cannot be applied to mappings without write permissions
> +and special mappings,
> +for example,
> +marked with the kernel-internal

s/marked/mappings marked/

> +.B VM_PFNMAP
> +and

Just checking: should it be "and" or "or" here"?

Looking at the EINVAL error below, I guess "or", and a better 
wording would be:

[[
...for example, mappings marked with kernel-internal flags such as
.B VMPPFNMAP
or
.BR BR_V_IO.
]]

> +.BR VM_IO .
> +.IP
> +Note that with
> +.BR MADV_POPULATE_WRITE ,
> +the process can be killed at any moment when the system runs out of memory.
>  .SH RETURN VALUE
>  On success,
>  .BR madvise ()
> @@ -490,6 +556,17 @@ A kernel resource was temporarily unavailable.
>  .B EBADF
>  The map exists, but the area maps something that isn't a file.
>  .TP
> +.B EFAULT
> +.I advice
> +is
> +.B MADV_POPULATE_READ
> +or
> +.BR MADV_POPULATE_WRITE ,
> +and populating (prefaulting) page tables failed because a
> +.B SIGBUS
> +would have been generated on actual memory access and the reason is not a
> +HW poisoned page.

Maybe:
s/.$/(see the description of MADV_HWPOISON in this page)./
?

> +.TP
>  .B EINVAL
>  .I addr
>  is not page-aligned or
> @@ -533,6 +610,18 @@ or
>  .BR VM_PFNMAP
>  ranges.
>  .TP
> +.B EINVAL
> +.I advice
> +is
> +.B MADV_POPULATE_READ
> +or
> +.BR MADV_POPULATE_WRITE ,
> +but the specified address range includes ranges with insufficient permissions
> +or incompatible mappings such as
> +.B VM_IO
> +or
> +.BR VM_PFNMAP.

s/.BR VM_PFNMAP./.BR VM_PFNMAP ./

> +.TP
>  .B EIO
>  (for
>  .BR MADV_WILLNEED )
> @@ -548,6 +637,15 @@ Not enough memory: paging in failed.
>  Addresses in the specified range are not currently
>  mapped, or are outside the address space of the process.
>  .TP
> +.B ENOMEM
> +.I advice
> +is
> +.B MADV_POPULATE_READ
> +or
> +.BR MADV_POPULATE_WRITE ,
> +and populating (prefaulting) page tables failed because there was not enough
> +memory.
> +.TP
>  .B EPERM
>  .I advice
>  is
> @@ -555,6 +653,15 @@ is
>  but the caller does not have the
>  .B CAP_SYS_ADMIN
>  capability.
> +.TP
> +.B EHWPOISON
> +.I advice
> +is
> +.B MADV_POPULATE_READ
> +or
> +.BR MADV_POPULATE_WRITE ,
> +and populating (prefaulting) page tables failed because a HW poisoned page
> +was encountered.
>  .SH VERSIONS
>  Since Linux 3.18,
>  .\" commit d3ac21cacc24790eb45d735769f35753f5b56ceb

Thanks,

Michael



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux