I found that xfstest generic/246 drives execution down the pvfs_bufmap_copy_to_kernel_iovec code path, I guess because of the way that ./src/t_mmap_writev.c uses mmap. xfstest generic/247 *really pounds* the pvfs_bufmap_copy_to_kernel_iovec code path. Looking at lib/iov_iter.c made me think copy_page_to_iter might be able to deal with either user or kernel address spaces. So I've ditched both pvfs_bufmap_copy_to_kernel_iovec and pvfs_bufmap_copy_to_user_iovec and replaced them with the very simple pvfs_bufmap_copy_to_iovec. I see no regressions with userspace tests that I was already using and none with the two xfstests that exercise the "kernel codepath". The logic that Al hated so much in file.c/postcopy_buffers is gone. Linus: I added you to the Cc: list this time because I remembered seeing you post that you don't read fs-devel every day... if you don't want to see these updates as I try to pry my "grudging ack" from Al <g> over the next few weeks, just say so... file.c/postcopy_buffers: if (total_size) { iov_iter_init(&iter, READ, vec, nr_segs, total_size); ret = pvfs_bufmap_copy_to_iovec(bufmap, &iter, buffer_index); if (ret < 0) gossip_err("%s: Failed to copy-out buffers. Please make sure that the pvfs2-client is running (%ld)\n", __func__, (long)ret); } pvfs2-bufmap.c/pvfs_bufmap_copy_to_iovec: /* * Iterate through the array of pages containing the bytes from * a file being read. * */ int pvfs_bufmap_copy_to_iovec(struct pvfs2_bufmap *bufmap, struct iov_iter *iter, int buffer_index) { struct pvfs_bufmap_desc *from; struct page *page; int i; gossip_debug(GOSSIP_BUFMAP_DEBUG, "%s: buffer_index:%d: iov_iter_count(iter):%lu:\n", __func__, buffer_index, iov_iter_count(iter)); from = &bufmap->desc_array[buffer_index]; for (i = 0; iov_iter_count(iter); i++) { page = from->page_array[i]; copy_page_to_iter(page, 0, PAGE_SIZE, iter); } return iov_iter_count(iter) ? -EFAULT : 0; } -Mike On Mon, Sep 14, 2015 at 4:29 PM, Mike Marshall <hubcap@xxxxxxxxxxxx> wrote: > When Linus reviewed the recent Orangefs pull request I made, > one thing he said was: > > > the iovecs are walked manually (eg pvfs_bufmap_copy_to_user_iovec()). > > I would really want to see the code use the iov_iter infrastructure > > I quickly made some changes, and after ripping off some code from > cifs, I had something that worked with no regressions... but ripped > off code that seems to work isn't the same as code designed on > purpose to work. > > This seems to work, on purpose even <g>... > > file.c: > > iov_iter_init(&iter, READ, vec, nr_segs, total_size); > ret = pvfs_bufmap_copy_to_user_iovec2(bufmap, > &iter, > buffer_index); > > pvfs2-bufmap.c: > > /* > * Iterate through the array of pages containing the bytes from > * a file being read. > * > */ > int pvfs_bufmap_copy_to_user_iovec2(struct pvfs2_bufmap *bufmap, > struct iov_iter *iter, > int buffer_index) > { > struct pvfs_bufmap_desc *from; > struct page *page; > int i; > > from = &bufmap->desc_array[buffer_index]; > > for (i = 0; iov_iter_count(iter); i++) { > page = from->page_array[i]; > copy_page_to_iter(page, 0, PAGE_SIZE, iter); > } > > return iov_iter_count(iter) ? -EFAULT : 0; > } > > > There's a pointer to an array of pages in the bufmap structure. Bufmap is > a structure that Christoph created. Bufmap is filled with related, but > previously free-floating, variables. Christoph glommed them together > into the new bufmap struct while making a fix for a lock inversion problem... > > This new version of pvfs_bufmap_copy_to_user_iovec peels off the pages > from the array and iterates through them with copy_page_to_iter. Unlike > the ripped off code, the for loop executes exactly the expected number > of times. If the file being read is smaller than a page, the loop is > executed once and file-size number of bytes are copied. If the file > being read is larger than a page, it may or may not all be in the page > array on a particular visit to pvfs_bufmap_copy_to_user_iovec2, but the > loop is executed the expected number of times based on iov_iter_count(iter), > peeling off a page at a time and then the correct remnant number of bytes. > > I spoke with Stephen Rothwall about how to go forward with the > orangefs tree I have in Linux Next... I plan to rebuild it based > on a 4.3 rc1 or rc2 tree, and it will have new stuff that gets > reviewed in it. > > If the above passes muster, it will be there... I'll ditch the > old open-coded pvfs_bufmap_copy_to_user_iovec and rename my > new pvfs_bufmap_copy_to_user_iovec2 to pvfs_bufmap_copy_to_user_iovec. > > For a while when working on pvfs_bufmap_copy_to_user_iovec2 I was > under the impression that copy_page_to_iter returned the number of > bytes *not* copied, based on what I incorrectly inferred from > https://lwn.net/Articles/625077, but I see that it indeed returns > the number of bytes copied. And iov_iter_count returns what's left to > copy. I don't even see a reason, in my case, to collect > copy_page_to_iter's return code - it is always PAGE_SIZE, or, on the > final time around, some remnant. I guess the "unlikely" return of 0 might > be considered an error return code? I guess checking to see that > copy_page_to_iter always returns either PAGE_SIZE or the same value as > iov_ter_count previously returned on the final remnant pass might > have some value in setting pvfs_bufmap_copy_to_user_iovec2's return code? > As it is, I just blast through and if iov_iter_count is 0 at the > end, its all good. > > Hopefully other things I send in for review will be less verbose <g>... > > -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html