[RFC] Orangefs and the iov_iter interface

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



When Linus reviewed the recent Orangefs pull request I made,
one thing he said was:

 > the iovecs are walked manually (eg pvfs_bufmap_copy_to_user_iovec()).
 > I would really want to see the code use the iov_iter infrastructure

I quickly made some changes, and after ripping off some code from
cifs, I had something that worked with no regressions... but ripped
off code that seems to work isn't the same as code designed on
purpose to work.

This seems to work, on purpose even <g>...

file.c:

  iov_iter_init(&iter, READ, vec, nr_segs, total_size);
  ret = pvfs_bufmap_copy_to_user_iovec2(bufmap,
                                        &iter,
                                        buffer_index);

pvfs2-bufmap.c:

/*
 * Iterate through the array of pages containing the bytes from
 * a file being read.
 *
 */
int pvfs_bufmap_copy_to_user_iovec2(struct pvfs2_bufmap *bufmap,
                                    struct iov_iter *iter,
                                    int buffer_index)
{
        struct pvfs_bufmap_desc *from;
        struct page *page;
        int i;

        from = &bufmap->desc_array[buffer_index];

        for (i = 0; iov_iter_count(iter); i++) {
                page = from->page_array[i];
                copy_page_to_iter(page, 0, PAGE_SIZE, iter);
        }

        return iov_iter_count(iter) ? -EFAULT : 0;
}


There's a pointer to an array of pages in the bufmap structure. Bufmap is
a structure that Christoph created. Bufmap is filled with related, but
previously free-floating, variables. Christoph glommed them together
into the new bufmap struct while making a fix for a lock inversion problem...

This new version of pvfs_bufmap_copy_to_user_iovec peels off the pages
from the array and iterates through them with copy_page_to_iter. Unlike
the ripped off code, the for loop executes exactly the expected number
of times. If the file being read is smaller than a page, the loop is
executed once and file-size number of bytes are copied. If the file
being read is larger than a page, it may or may not all be in the page
array on a particular visit to pvfs_bufmap_copy_to_user_iovec2, but the
loop is executed the expected number of times based on iov_iter_count(iter),
peeling off a page at a time and then the correct remnant number of bytes.

I spoke with Stephen Rothwall about how to go forward with the
orangefs tree I have in Linux Next... I plan to rebuild it based
on a 4.3 rc1 or rc2 tree, and it will have new stuff that gets
reviewed in it.

If the above passes muster, it will be there... I'll ditch the
old open-coded pvfs_bufmap_copy_to_user_iovec and rename my
new pvfs_bufmap_copy_to_user_iovec2 to pvfs_bufmap_copy_to_user_iovec.

For a while when working on pvfs_bufmap_copy_to_user_iovec2 I was
under the impression that copy_page_to_iter returned the number of
bytes *not* copied, based on what I incorrectly inferred from
https://lwn.net/Articles/625077, but I see that it indeed returns
the number of bytes copied. And iov_iter_count returns what's left to
copy. I don't even see a reason, in my case, to collect
copy_page_to_iter's return code - it is always PAGE_SIZE, or, on the
final time around, some remnant. I guess the "unlikely" return of 0 might
be considered an error return code? I guess checking to see that
copy_page_to_iter always returns either PAGE_SIZE or the same value as
iov_ter_count previously returned on the final remnant pass might
have some value in setting pvfs_bufmap_copy_to_user_iovec2's return code?
As it is, I just blast through and if iov_iter_count is 0 at the
end, its all good.

Hopefully other things I send in for review will be less verbose <g>...

-Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux