Re: [PATCH v2] madvise.2: Document MADV_POPULATE_READ and MADV_POPULATE_WRITE

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello David,

On 8/18/21 10:35 AM, David Hildenbrand wrote:
> On 17.08.21 23:42, Michael Kerrisk (man-pages) wrote:
>> Hello David,
>>
>> Thank you for writing this! Could you please take
>> a look at the comments below and revise?
> 
> Hi Michael,
> 
> thanks for your valuable input. Your feedback will certainly make this 
> easier to understand for people that are not heavily involved in MM work :)
> 
> [...]
> 
>>>   man2/madvise.2 | 107 +++++++++++++++++++++++++++++++++++++++++++++++++
>>>   1 file changed, 107 insertions(+)
>>>
>>> diff --git a/man2/madvise.2 b/man2/madvise.2
>>> index f1f384c0c..f6cea9ad2 100644
>>> --- a/man2/madvise.2
>>> +++ b/man2/madvise.2
>>> @@ -469,6 +469,72 @@ If a page is file-backed and dirty, it will be written back to the backing
>>>   storage.
>>>   The advice might be ignored for some pages in the range when it is not
>>>   applicable.
>>> +.TP
>>> +.BR MADV_POPULATE_READ " (since Linux 5.14)"
>>> +Populate (prefault) page tables readable for the whole range without actually
>>
>> I have trouble to understand "Populate (prefault) page tables readable".
>> Does it mean that it is just the page tables are being populated, and the
>> PTEs are marked to indicate that the pages are readable? If yes, I
>> think some rewording would help.
> 
> I actually tried phrasing it similar to our MAP_POPULATE documentation:
> 	("Populate  (prefault)  page tables for a mapping.")

Yeah, well that description is a bit thin too :-}.

> We will prefault all pages, faulting them in.
>>
>>> +reading memory.
>>
>> I don't understand "without actually reading memory"? Do you mean,
>> "without actually  faulting in the pages"; or something else?
> 
> "Populate (prefault) page tables readable, faulting in all pages in the 
> range just as if manually reading one byte of each page; however, avoid 
> the actual memory access that would have been performed after handling 
> the fault."
> 
> Does that make it clearer? (avoiding eventually touching the page at all 
> can be beneficial, especially when dealing with DAX memory where memory 
> access might be expensive)

That text is much better. But, what's still not clear to me then is the
dfference between mmap(2) MAP_POPULATE, and MADV_POPULATE_READ and
MADV_POPULATE_WRITE. What is the differnece, and in what situations 
would one prefer one or the other approach? I think it would be helpful
if the manual page said something about these details.

>>> +Depending on the underlying mapping,
>>> +map the shared zeropage,
>>> +preallocate memory or read the underlying file;
>>> +files with holes might or might not preallocate blocks.
>>> +Do not generate
>>> +.B SIGBUS
>>> +when populating fails,
>>> +return an error instead.
>>
>> Better:
>>
>> [[
>> If populating fails, a
>> .B SIGBUS
>> signal is not generated; instead, an error i returned.
>> ]]
>>
> 
> Sure, thanks.
> 
>>> +.IP
>>> +If
>>> +.B MADV_POPULATE_READ
>>> +succeeds,
>>> +all page tables have been populated (prefaulted) readable once.
>>> +If
>>> +.B MADV_POPULATE_READ
>>> +fails,
>>> +some page tables might have been populated.
>>> +.IP
>>> +.B MADV_POPULATE_READ
>>> +cannot be applied to mappings without read permissions
>>> +and special mappings,
>>> +for example,
>>> +marked with the kernel-internal
>>
>> s/marked/mappings marked/
>>
>>> +.B VM_PFNMAP
>>> +and
>>
>> Just checking: should it be "and" or "or" here"?
>>
>> Looking at the EINVAL error below, I guess "or", and a better
>> wording would be:
>>
>> [[
>> ...for example, mappings marked with kernel-internal flags such as
>> .B VMPPFNMAP
>> or
>> .BR BR_V_IO.
>> ]]
> 
> Much better. Note that there might be more types of mappings that won't 
> work (e.g., initially also secretmem IIRC).

Ahh nice. Since there's about to be a memfd_secret() manual page, 
I suggest adding also "or secret memory regions created using
memfd_secret(2)".

>>> +.BR VM_IO .
>>> +.IP
>>> +Note that with
>>> +.BR MADV_POPULATE_READ ,
>>> +the process can be killed at any moment when the system runs out of memory.
>>> +.TP
>>> +.BR MADV_POPULATE_WRITE " (since Linux 5.14)"
>>> +Populate (prefault) page tables writable for the whole range without actually
>>
>> I have trouble to understand "Populate (prefault) page tables writable".
>> Does it mean that it is just the page tables are being populated, and the
>> PTEs are marked to indicate that the pages are writable? If yes, I
>> think some rewording would help.
>>
>>> +writing memory.
>>
>> I don't understand "without actually writing memory"? Do you mean,
>> "without actually  faulting in the pages"; or something else?
>>
> 
> Similar to the other wording:
> 
> "Populate (prefault) page tables writable, faulting in all pages in the 
> range just as if manually writing one byte of each page; however, avoid 
> the actual memory access that would have been performed after handling 
> the fault."

Much better, but see also my comments above re MADV_POPULATE_READ.

[...]

>>> +.B EFAULT
>>> +.I advice
>>> +is
>>> +.B MADV_POPULATE_READ
>>> +or
>>> +.BR MADV_POPULATE_WRITE ,
>>> +and populating (prefaulting) page tables failed because a
>>> +.B SIGBUS
>>> +would have been generated on actual memory access and the reason is not a
>>> +HW poisoned page.
>>
>> Maybe:
>> s/.$/(see the description of MADV_HWPOISON in this page)./
>> ?
>>
> 
> Sure, we can add that. But note that MADV_HWPOISON is just one of many 
> ways to HWpoison a page.

Then maybe something like:

"(HW poisoned pages can, for example, be created using the
MADV_HWPOISON flag described elsewhere in this page.)"

[...]

Thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux