Re: [RFC 2] Orangefs and the iov_iter interface

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I found that xfstest generic/246 drives execution down
the pvfs_bufmap_copy_to_kernel_iovec code path, I guess
because of the way that ./src/t_mmap_writev.c uses mmap.

xfstest generic/247 *really pounds* the
pvfs_bufmap_copy_to_kernel_iovec code path.

Looking at lib/iov_iter.c made me think copy_page_to_iter
might be able to deal with either user or kernel address
spaces.

So I've ditched both pvfs_bufmap_copy_to_kernel_iovec
and pvfs_bufmap_copy_to_user_iovec and replaced them
with the very simple pvfs_bufmap_copy_to_iovec.

I see no regressions with userspace tests that I was already
using and none with the two xfstests that exercise the "kernel
codepath".

The logic that Al hated so much in file.c/postcopy_buffers
is gone.

Linus: I added you to the Cc: list this time because I remembered
seeing you post that you don't read fs-devel every day... if you
don't want to see these updates as I try to pry my "grudging ack"
from Al <g> over the next few weeks, just say so...

file.c/postcopy_buffers:

        if (total_size) {
                iov_iter_init(&iter, READ, vec, nr_segs, total_size);
                ret = pvfs_bufmap_copy_to_iovec(bufmap,
                                                &iter,
                                                buffer_index);
                if (ret < 0)
                        gossip_err("%s: Failed to copy-out buffers.
Please make sure that the pvfs2-client is running (%ld)\n",
                                __func__,
                                (long)ret);
        }

pvfs2-bufmap.c/pvfs_bufmap_copy_to_iovec:

/*
 * Iterate through the array of pages containing the bytes from
 * a file being read.
 *
 */
int pvfs_bufmap_copy_to_iovec(struct pvfs2_bufmap *bufmap,
   struct iov_iter *iter,
   int buffer_index)
{
struct pvfs_bufmap_desc *from;
struct page *page;
int i;

gossip_debug(GOSSIP_BUFMAP_DEBUG,
    "%s: buffer_index:%d: iov_iter_count(iter):%lu:\n",
    __func__, buffer_index, iov_iter_count(iter));

        from = &bufmap->desc_array[buffer_index];

for (i = 0; iov_iter_count(iter); i++) {
page = from->page_array[i];
copy_page_to_iter(page, 0, PAGE_SIZE, iter);
}

        return iov_iter_count(iter) ? -EFAULT : 0;
}

-Mike

On Mon, Sep 14, 2015 at 4:29 PM, Mike Marshall <hubcap@xxxxxxxxxxxx> wrote:
> When Linus reviewed the recent Orangefs pull request I made,
> one thing he said was:
>
>  > the iovecs are walked manually (eg pvfs_bufmap_copy_to_user_iovec()).
>  > I would really want to see the code use the iov_iter infrastructure
>
> I quickly made some changes, and after ripping off some code from
> cifs, I had something that worked with no regressions... but ripped
> off code that seems to work isn't the same as code designed on
> purpose to work.
>
> This seems to work, on purpose even <g>...
>
> file.c:
>
>   iov_iter_init(&iter, READ, vec, nr_segs, total_size);
>   ret = pvfs_bufmap_copy_to_user_iovec2(bufmap,
>                                         &iter,
>                                         buffer_index);
>
> pvfs2-bufmap.c:
>
> /*
>  * Iterate through the array of pages containing the bytes from
>  * a file being read.
>  *
>  */
> int pvfs_bufmap_copy_to_user_iovec2(struct pvfs2_bufmap *bufmap,
>                                     struct iov_iter *iter,
>                                     int buffer_index)
> {
>         struct pvfs_bufmap_desc *from;
>         struct page *page;
>         int i;
>
>         from = &bufmap->desc_array[buffer_index];
>
>         for (i = 0; iov_iter_count(iter); i++) {
>                 page = from->page_array[i];
>                 copy_page_to_iter(page, 0, PAGE_SIZE, iter);
>         }
>
>         return iov_iter_count(iter) ? -EFAULT : 0;
> }
>
>
> There's a pointer to an array of pages in the bufmap structure. Bufmap is
> a structure that Christoph created. Bufmap is filled with related, but
> previously free-floating, variables. Christoph glommed them together
> into the new bufmap struct while making a fix for a lock inversion problem...
>
> This new version of pvfs_bufmap_copy_to_user_iovec peels off the pages
> from the array and iterates through them with copy_page_to_iter. Unlike
> the ripped off code, the for loop executes exactly the expected number
> of times. If the file being read is smaller than a page, the loop is
> executed once and file-size number of bytes are copied. If the file
> being read is larger than a page, it may or may not all be in the page
> array on a particular visit to pvfs_bufmap_copy_to_user_iovec2, but the
> loop is executed the expected number of times based on iov_iter_count(iter),
> peeling off a page at a time and then the correct remnant number of bytes.
>
> I spoke with Stephen Rothwall about how to go forward with the
> orangefs tree I have in Linux Next... I plan to rebuild it based
> on a 4.3 rc1 or rc2 tree, and it will have new stuff that gets
> reviewed in it.
>
> If the above passes muster, it will be there... I'll ditch the
> old open-coded pvfs_bufmap_copy_to_user_iovec and rename my
> new pvfs_bufmap_copy_to_user_iovec2 to pvfs_bufmap_copy_to_user_iovec.
>
> For a while when working on pvfs_bufmap_copy_to_user_iovec2 I was
> under the impression that copy_page_to_iter returned the number of
> bytes *not* copied, based on what I incorrectly inferred from
> https://lwn.net/Articles/625077, but I see that it indeed returns
> the number of bytes copied. And iov_iter_count returns what's left to
> copy. I don't even see a reason, in my case, to collect
> copy_page_to_iter's return code - it is always PAGE_SIZE, or, on the
> final time around, some remnant. I guess the "unlikely" return of 0 might
> be considered an error return code? I guess checking to see that
> copy_page_to_iter always returns either PAGE_SIZE or the same value as
> iov_ter_count previously returned on the final remnant pass might
> have some value in setting pvfs_bufmap_copy_to_user_iovec2's return code?
> As it is, I just blast through and if iov_iter_count is 0 at the
> end, its all good.
>
> Hopefully other things I send in for review will be less verbose <g>...
>
> -Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux