Re: [PATCH v7 4/4] mm: vmalloc: convert vread() to vread_iter()

David Hildenbrand <david@xxxxxxxxxx> · Thu, 23 Mar 2023 11:38:52 +0100

On 23.03.23 11:36, Baoquan He wrote:
On 03/23/23 at 06:44am, Lorenzo Stoakes wrote:
On Thu, Mar 23, 2023 at 10:52:09AM +0800, Baoquan He wrote:
On 03/22/23 at 06:57pm, Lorenzo Stoakes wrote:
Having previously laid the foundation for converting vread() to an iterator
function, pull the trigger and do so.

This patch attempts to provide minimal refactoring and to reflect the
existing logic as best we can, for example we continue to zero portions of
memory not read, as before.

Overall, there should be no functional difference other than a performance
improvement in /proc/kcore access to vmalloc regions.

Now we have eliminated the need for a bounce buffer in read_kcore_iter(),
we dispense with it, and try to write to user memory optimistically but
with faults disabled via copy_page_to_iter_nofault(). We already have
preemption disabled by holding a spin lock. We continue faulting in until
the operation is complete.

I don't understand the sentences here. In vread_iter(), the actual
content reading is done in aligned_vread_iter(), otherwise we zero
filling the region. In aligned_vread_iter(), we will use
vmalloc_to_page() to get the mapped page and read out, otherwise zero
fill. While in this patch, fault_in_iov_iter_writeable() fault in memory
of iter one time and will bail out if failed. I am wondering why we
continue faulting in until the operation is complete, and how that is done.

This is refererrring to what's happening in kcore.c, not vread_iter(),
i.e. the looped read/faultin.

The reason we bail out if failt_in_iov_iter_writeable() is that would
indicate an error had occurred.

The whole point is to _optimistically_ try to perform the operation
assuming the pages are faulted in. Ultimately we fault in via
copy_to_user_nofault() which will either copy data or fail if the pages are
not faulted in (will discuss this below a bit more in response to your
other point).

If this fails, then we fault in, and try again. We loop because there could
be some extremely unfortunate timing with a race on e.g. swapping out or
migrating pages between faulting in and trying to write out again.

This is extremely unlikely, but to avoid any chance of breaking userland we
repeat the operation until it completes. In nearly all real-world
situations it'll either work immediately or loop once.

Thanks a lot for these helpful details with patience. I got it now. I was
mainly confused by the while(true) loop in KCORE_VMALLOC case of read_kcore_iter.

Now is there any chance that the faulted in memory is swapped out or
migrated again before vread_iter()? fault_in_iov_iter_writeable() will
pin the memory? I didn't find it from code and document. Seems it only
falults in memory. If yes, there's window between faluting in and
copy_to_user_nofault().

See the documentation of fault_in_safe_writeable():

"Note that we don't pin or otherwise hold the pages referenced that we 
fault in.  There's no guarantee that they'll stay in memory for any 
duration of time."

--
Thanks,

David / dhildenb