On 03/22/23 at 06:57pm, Lorenzo Stoakes wrote: > Having previously laid the foundation for converting vread() to an iterator > function, pull the trigger and do so. > > This patch attempts to provide minimal refactoring and to reflect the > existing logic as best we can, for example we continue to zero portions of > memory not read, as before. > > Overall, there should be no functional difference other than a performance > improvement in /proc/kcore access to vmalloc regions. > > Now we have eliminated the need for a bounce buffer in read_kcore_iter(), > we dispense with it, and try to write to user memory optimistically but > with faults disabled via copy_page_to_iter_nofault(). We already have > preemption disabled by holding a spin lock. We continue faulting in until > the operation is complete. I don't understand the sentences here. In vread_iter(), the actual content reading is done in aligned_vread_iter(), otherwise we zero filling the region. In aligned_vread_iter(), we will use vmalloc_to_page() to get the mapped page and read out, otherwise zero fill. While in this patch, fault_in_iov_iter_writeable() fault in memory of iter one time and will bail out if failed. I am wondering why we continue faulting in until the operation is complete, and how that is done. If we look into the failing point in vread_iter(), it's mainly coming from copy_page_to_iter_nofault(), e.g page_copy_sane() checking failed, i->data_source checking failed. If these conditional checking failed, should we continue reading again and again? And this is not related to memory faulting in. I saw your discussion with David, but I am still a little lost. Hope I can learn it, thanks in advance. ...... > diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c > index 08b795fd80b4..25b44b303b35 100644 > --- a/fs/proc/kcore.c > +++ b/fs/proc/kcore.c ...... > @@ -507,13 +503,30 @@ read_kcore_iter(struct kiocb *iocb, struct iov_iter *iter) > > switch (m->type) { > case KCORE_VMALLOC: > - vread(buf, (char *)start, tsz); > - /* we have to zero-fill user buffer even if no read */ > - if (copy_to_iter(buf, tsz, iter) != tsz) { > - ret = -EFAULT; > - goto out; > + { > + const char *src = (char *)start; > + size_t read = 0, left = tsz; > + > + /* > + * vmalloc uses spinlocks, so we optimistically try to > + * read memory. If this fails, fault pages in and try > + * again until we are done. > + */ > + while (true) { > + read += vread_iter(iter, src, left); > + if (read == tsz) > + break; > + > + src += read; > + left -= read; > + > + if (fault_in_iov_iter_writeable(iter, left)) { > + ret = -EFAULT; > + goto out; > + } > } > break; > + } > case KCORE_USER: > /* User page is handled prior to normal kernel page: */ > if (copy_to_iter((char *)start, tsz, iter) != tsz) {