Hello David, On 8/18/21 10:35 AM, David Hildenbrand wrote: > On 17.08.21 23:42, Michael Kerrisk (man-pages) wrote: >> Hello David, >> >> Thank you for writing this! Could you please take >> a look at the comments below and revise? > > Hi Michael, > > thanks for your valuable input. Your feedback will certainly make this > easier to understand for people that are not heavily involved in MM work :) > > [...] > >>> man2/madvise.2 | 107 +++++++++++++++++++++++++++++++++++++++++++++++++ >>> 1 file changed, 107 insertions(+) >>> >>> diff --git a/man2/madvise.2 b/man2/madvise.2 >>> index f1f384c0c..f6cea9ad2 100644 >>> --- a/man2/madvise.2 >>> +++ b/man2/madvise.2 >>> @@ -469,6 +469,72 @@ If a page is file-backed and dirty, it will be written back to the backing >>> storage. >>> The advice might be ignored for some pages in the range when it is not >>> applicable. >>> +.TP >>> +.BR MADV_POPULATE_READ " (since Linux 5.14)" >>> +Populate (prefault) page tables readable for the whole range without actually >> >> I have trouble to understand "Populate (prefault) page tables readable". >> Does it mean that it is just the page tables are being populated, and the >> PTEs are marked to indicate that the pages are readable? If yes, I >> think some rewording would help. > > I actually tried phrasing it similar to our MAP_POPULATE documentation: > ("Populate (prefault) page tables for a mapping.") Yeah, well that description is a bit thin too :-}. > We will prefault all pages, faulting them in. >> >>> +reading memory. >> >> I don't understand "without actually reading memory"? Do you mean, >> "without actually faulting in the pages"; or something else? > > "Populate (prefault) page tables readable, faulting in all pages in the > range just as if manually reading one byte of each page; however, avoid > the actual memory access that would have been performed after handling > the fault." > > Does that make it clearer? (avoiding eventually touching the page at all > can be beneficial, especially when dealing with DAX memory where memory > access might be expensive) That text is much better. But, what's still not clear to me then is the dfference between mmap(2) MAP_POPULATE, and MADV_POPULATE_READ and MADV_POPULATE_WRITE. What is the differnece, and in what situations would one prefer one or the other approach? I think it would be helpful if the manual page said something about these details. >>> +Depending on the underlying mapping, >>> +map the shared zeropage, >>> +preallocate memory or read the underlying file; >>> +files with holes might or might not preallocate blocks. >>> +Do not generate >>> +.B SIGBUS >>> +when populating fails, >>> +return an error instead. >> >> Better: >> >> [[ >> If populating fails, a >> .B SIGBUS >> signal is not generated; instead, an error i returned. >> ]] >> > > Sure, thanks. > >>> +.IP >>> +If >>> +.B MADV_POPULATE_READ >>> +succeeds, >>> +all page tables have been populated (prefaulted) readable once. >>> +If >>> +.B MADV_POPULATE_READ >>> +fails, >>> +some page tables might have been populated. >>> +.IP >>> +.B MADV_POPULATE_READ >>> +cannot be applied to mappings without read permissions >>> +and special mappings, >>> +for example, >>> +marked with the kernel-internal >> >> s/marked/mappings marked/ >> >>> +.B VM_PFNMAP >>> +and >> >> Just checking: should it be "and" or "or" here"? >> >> Looking at the EINVAL error below, I guess "or", and a better >> wording would be: >> >> [[ >> ...for example, mappings marked with kernel-internal flags such as >> .B VMPPFNMAP >> or >> .BR BR_V_IO. >> ]] > > Much better. Note that there might be more types of mappings that won't > work (e.g., initially also secretmem IIRC). Ahh nice. Since there's about to be a memfd_secret() manual page, I suggest adding also "or secret memory regions created using memfd_secret(2)". >>> +.BR VM_IO . >>> +.IP >>> +Note that with >>> +.BR MADV_POPULATE_READ , >>> +the process can be killed at any moment when the system runs out of memory. >>> +.TP >>> +.BR MADV_POPULATE_WRITE " (since Linux 5.14)" >>> +Populate (prefault) page tables writable for the whole range without actually >> >> I have trouble to understand "Populate (prefault) page tables writable". >> Does it mean that it is just the page tables are being populated, and the >> PTEs are marked to indicate that the pages are writable? If yes, I >> think some rewording would help. >> >>> +writing memory. >> >> I don't understand "without actually writing memory"? Do you mean, >> "without actually faulting in the pages"; or something else? >> > > Similar to the other wording: > > "Populate (prefault) page tables writable, faulting in all pages in the > range just as if manually writing one byte of each page; however, avoid > the actual memory access that would have been performed after handling > the fault." Much better, but see also my comments above re MADV_POPULATE_READ. [...] >>> +.B EFAULT >>> +.I advice >>> +is >>> +.B MADV_POPULATE_READ >>> +or >>> +.BR MADV_POPULATE_WRITE , >>> +and populating (prefaulting) page tables failed because a >>> +.B SIGBUS >>> +would have been generated on actual memory access and the reason is not a >>> +HW poisoned page. >> >> Maybe: >> s/.$/(see the description of MADV_HWPOISON in this page)./ >> ? >> > > Sure, we can add that. But note that MADV_HWPOISON is just one of many > ways to HWpoison a page. Then maybe something like: "(HW poisoned pages can, for example, be created using the MADV_HWPOISON flag described elsewhere in this page.)" [...] Thanks, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/