I still need to replace the open coded pvfs_bufmap_copy_iovec_from_user and pvfs_bufmap_copy_iovec_from_kernel. I'm pretty sure I know what to do, but I am still searching for a test I can run that drives execution down the pvfs_bufmap_copy_iovec_from_kernel code path. While running a test program I wrote (which doesn't drive execution down the desired code path) I oopsed the kernel in my pvfs_bufmap_copy_to_iovec function. I see now why I very much need to care about the return value of copy_page_to_iter. Anyhow, now pvfs_bufmap_copy_to_iovec looks like this: int pvfs_bufmap_copy_to_iovec(struct pvfs2_bufmap *bufmap, struct iov_iter *iter, int buffer_index) { struct pvfs_bufmap_desc *from; struct page *page; int i; size_t written; gossip_debug(GOSSIP_BUFMAP_DEBUG, "%s: buffer_index:%d: iov_iter_count(iter):%lu:\n", __func__, buffer_index, iov_iter_count(iter)); from = &bufmap->desc_array[buffer_index]; for (i = 0; iov_iter_count(iter); i++) { page = from->page_array[i]; written = copy_page_to_iter(page, 0, PAGE_SIZE, iter); if ((written == 0) && (iov_iter_count(iter))) break; } return iov_iter_count(iter) ? -EFAULT : 0; } ... and my poorly coded test program fails on Orangefs just like it does on ext4 and tmpfs (not by oopsing the kernel)... $ ./write_kvec /pvfsmnt/asdf written:12288: read:-1: $ ./write_kvec /home/hubcap/asdf written:12288: read:-1: $ ./write_kvec /tmp/asdf written:12288: read:-1: -Mike On Wed, Sep 16, 2015 at 1:24 PM, Mike Marshall <hubcap@xxxxxxxxxxxx> wrote: > I found that xfstest generic/246 drives execution down > the pvfs_bufmap_copy_to_kernel_iovec code path, I guess > because of the way that ./src/t_mmap_writev.c uses mmap. > > xfstest generic/247 *really pounds* the > pvfs_bufmap_copy_to_kernel_iovec code path. > > Looking at lib/iov_iter.c made me think copy_page_to_iter > might be able to deal with either user or kernel address > spaces. > > So I've ditched both pvfs_bufmap_copy_to_kernel_iovec > and pvfs_bufmap_copy_to_user_iovec and replaced them > with the very simple pvfs_bufmap_copy_to_iovec. > > I see no regressions with userspace tests that I was already > using and none with the two xfstests that exercise the "kernel > codepath". > > The logic that Al hated so much in file.c/postcopy_buffers > is gone. > > Linus: I added you to the Cc: list this time because I remembered > seeing you post that you don't read fs-devel every day... if you > don't want to see these updates as I try to pry my "grudging ack" > from Al <g> over the next few weeks, just say so... > > file.c/postcopy_buffers: > > if (total_size) { > iov_iter_init(&iter, READ, vec, nr_segs, total_size); > ret = pvfs_bufmap_copy_to_iovec(bufmap, > &iter, > buffer_index); > if (ret < 0) > gossip_err("%s: Failed to copy-out buffers. > Please make sure that the pvfs2-client is running (%ld)\n", > __func__, > (long)ret); > } > > pvfs2-bufmap.c/pvfs_bufmap_copy_to_iovec: > > /* > * Iterate through the array of pages containing the bytes from > * a file being read. > * > */ > int pvfs_bufmap_copy_to_iovec(struct pvfs2_bufmap *bufmap, > struct iov_iter *iter, > int buffer_index) > { > struct pvfs_bufmap_desc *from; > struct page *page; > int i; > > gossip_debug(GOSSIP_BUFMAP_DEBUG, > "%s: buffer_index:%d: iov_iter_count(iter):%lu:\n", > __func__, buffer_index, iov_iter_count(iter)); > > from = &bufmap->desc_array[buffer_index]; > > for (i = 0; iov_iter_count(iter); i++) { > page = from->page_array[i]; > copy_page_to_iter(page, 0, PAGE_SIZE, iter); > } > > return iov_iter_count(iter) ? -EFAULT : 0; > } > > -Mike > > On Mon, Sep 14, 2015 at 4:29 PM, Mike Marshall <hubcap@xxxxxxxxxxxx> wrote: >> When Linus reviewed the recent Orangefs pull request I made, >> one thing he said was: >> >> > the iovecs are walked manually (eg pvfs_bufmap_copy_to_user_iovec()). >> > I would really want to see the code use the iov_iter infrastructure >> >> I quickly made some changes, and after ripping off some code from >> cifs, I had something that worked with no regressions... but ripped >> off code that seems to work isn't the same as code designed on >> purpose to work. >> >> This seems to work, on purpose even <g>... >> >> file.c: >> >> iov_iter_init(&iter, READ, vec, nr_segs, total_size); >> ret = pvfs_bufmap_copy_to_user_iovec2(bufmap, >> &iter, >> buffer_index); >> >> pvfs2-bufmap.c: >> >> /* >> * Iterate through the array of pages containing the bytes from >> * a file being read. >> * >> */ >> int pvfs_bufmap_copy_to_user_iovec2(struct pvfs2_bufmap *bufmap, >> struct iov_iter *iter, >> int buffer_index) >> { >> struct pvfs_bufmap_desc *from; >> struct page *page; >> int i; >> >> from = &bufmap->desc_array[buffer_index]; >> >> for (i = 0; iov_iter_count(iter); i++) { >> page = from->page_array[i]; >> copy_page_to_iter(page, 0, PAGE_SIZE, iter); >> } >> >> return iov_iter_count(iter) ? -EFAULT : 0; >> } >> >> >> There's a pointer to an array of pages in the bufmap structure. Bufmap is >> a structure that Christoph created. Bufmap is filled with related, but >> previously free-floating, variables. Christoph glommed them together >> into the new bufmap struct while making a fix for a lock inversion problem... >> >> This new version of pvfs_bufmap_copy_to_user_iovec peels off the pages >> from the array and iterates through them with copy_page_to_iter. Unlike >> the ripped off code, the for loop executes exactly the expected number >> of times. If the file being read is smaller than a page, the loop is >> executed once and file-size number of bytes are copied. If the file >> being read is larger than a page, it may or may not all be in the page >> array on a particular visit to pvfs_bufmap_copy_to_user_iovec2, but the >> loop is executed the expected number of times based on iov_iter_count(iter), >> peeling off a page at a time and then the correct remnant number of bytes. >> >> I spoke with Stephen Rothwall about how to go forward with the >> orangefs tree I have in Linux Next... I plan to rebuild it based >> on a 4.3 rc1 or rc2 tree, and it will have new stuff that gets >> reviewed in it. >> >> If the above passes muster, it will be there... I'll ditch the >> old open-coded pvfs_bufmap_copy_to_user_iovec and rename my >> new pvfs_bufmap_copy_to_user_iovec2 to pvfs_bufmap_copy_to_user_iovec. >> >> For a while when working on pvfs_bufmap_copy_to_user_iovec2 I was >> under the impression that copy_page_to_iter returned the number of >> bytes *not* copied, based on what I incorrectly inferred from >> https://lwn.net/Articles/625077, but I see that it indeed returns >> the number of bytes copied. And iov_iter_count returns what's left to >> copy. I don't even see a reason, in my case, to collect >> copy_page_to_iter's return code - it is always PAGE_SIZE, or, on the >> final time around, some remnant. I guess the "unlikely" return of 0 might >> be considered an error return code? I guess checking to see that >> copy_page_to_iter always returns either PAGE_SIZE or the same value as >> iov_ter_count previously returned on the final remnant pass might >> have some value in setting pvfs_bufmap_copy_to_user_iovec2's return code? >> As it is, I just blast through and if iov_iter_count is 0 at the >> end, its all good. >> >> Hopefully other things I send in for review will be less verbose <g>... >> >> -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html