Re: [RFC 2] Orangefs and the iov_iter interface

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I still need to replace the open coded
pvfs_bufmap_copy_iovec_from_user
and pvfs_bufmap_copy_iovec_from_kernel. I'm
pretty sure I know what to do, but I am still searching
for a test I can run that drives execution
down the pvfs_bufmap_copy_iovec_from_kernel code
path.

While running a test program I wrote (which doesn't
drive execution down the desired code path) I oopsed
the kernel in my pvfs_bufmap_copy_to_iovec function.
I see now why I very much need to care about the
return value of copy_page_to_iter.

Anyhow, now pvfs_bufmap_copy_to_iovec looks like this:

int pvfs_bufmap_copy_to_iovec(struct pvfs2_bufmap *bufmap,
   struct iov_iter *iter,
   int buffer_index)
{
struct pvfs_bufmap_desc *from;
struct page *page;
int i;
size_t written;

gossip_debug(GOSSIP_BUFMAP_DEBUG,
    "%s: buffer_index:%d: iov_iter_count(iter):%lu:\n",
    __func__, buffer_index, iov_iter_count(iter));

 from = &bufmap->desc_array[buffer_index];

  for (i = 0; iov_iter_count(iter); i++) {
    page = from->page_array[i];
    written = copy_page_to_iter(page, 0, PAGE_SIZE, iter);
    if ((written == 0) && (iov_iter_count(iter)))
      break;
  }

  return iov_iter_count(iter) ? -EFAULT : 0;
}

... and my poorly coded test program fails on Orangefs
just like it does on ext4 and tmpfs (not by oopsing the
kernel)...

$ ./write_kvec /pvfsmnt/asdf
written:12288:
read:-1:
$ ./write_kvec /home/hubcap/asdf
written:12288:
read:-1:
$ ./write_kvec /tmp/asdf
written:12288:
read:-1:

-Mike

On Wed, Sep 16, 2015 at 1:24 PM, Mike Marshall <hubcap@xxxxxxxxxxxx> wrote:
> I found that xfstest generic/246 drives execution down
> the pvfs_bufmap_copy_to_kernel_iovec code path, I guess
> because of the way that ./src/t_mmap_writev.c uses mmap.
>
> xfstest generic/247 *really pounds* the
> pvfs_bufmap_copy_to_kernel_iovec code path.
>
> Looking at lib/iov_iter.c made me think copy_page_to_iter
> might be able to deal with either user or kernel address
> spaces.
>
> So I've ditched both pvfs_bufmap_copy_to_kernel_iovec
> and pvfs_bufmap_copy_to_user_iovec and replaced them
> with the very simple pvfs_bufmap_copy_to_iovec.
>
> I see no regressions with userspace tests that I was already
> using and none with the two xfstests that exercise the "kernel
> codepath".
>
> The logic that Al hated so much in file.c/postcopy_buffers
> is gone.
>
> Linus: I added you to the Cc: list this time because I remembered
> seeing you post that you don't read fs-devel every day... if you
> don't want to see these updates as I try to pry my "grudging ack"
> from Al <g> over the next few weeks, just say so...
>
> file.c/postcopy_buffers:
>
>         if (total_size) {
>                 iov_iter_init(&iter, READ, vec, nr_segs, total_size);
>                 ret = pvfs_bufmap_copy_to_iovec(bufmap,
>                                                 &iter,
>                                                 buffer_index);
>                 if (ret < 0)
>                         gossip_err("%s: Failed to copy-out buffers.
> Please make sure that the pvfs2-client is running (%ld)\n",
>                                 __func__,
>                                 (long)ret);
>         }
>
> pvfs2-bufmap.c/pvfs_bufmap_copy_to_iovec:
>
> /*
>  * Iterate through the array of pages containing the bytes from
>  * a file being read.
>  *
>  */
> int pvfs_bufmap_copy_to_iovec(struct pvfs2_bufmap *bufmap,
>    struct iov_iter *iter,
>    int buffer_index)
> {
> struct pvfs_bufmap_desc *from;
> struct page *page;
> int i;
>
> gossip_debug(GOSSIP_BUFMAP_DEBUG,
>     "%s: buffer_index:%d: iov_iter_count(iter):%lu:\n",
>     __func__, buffer_index, iov_iter_count(iter));
>
>         from = &bufmap->desc_array[buffer_index];
>
> for (i = 0; iov_iter_count(iter); i++) {
> page = from->page_array[i];
> copy_page_to_iter(page, 0, PAGE_SIZE, iter);
> }
>
>         return iov_iter_count(iter) ? -EFAULT : 0;
> }
>
> -Mike
>
> On Mon, Sep 14, 2015 at 4:29 PM, Mike Marshall <hubcap@xxxxxxxxxxxx> wrote:
>> When Linus reviewed the recent Orangefs pull request I made,
>> one thing he said was:
>>
>>  > the iovecs are walked manually (eg pvfs_bufmap_copy_to_user_iovec()).
>>  > I would really want to see the code use the iov_iter infrastructure
>>
>> I quickly made some changes, and after ripping off some code from
>> cifs, I had something that worked with no regressions... but ripped
>> off code that seems to work isn't the same as code designed on
>> purpose to work.
>>
>> This seems to work, on purpose even <g>...
>>
>> file.c:
>>
>>   iov_iter_init(&iter, READ, vec, nr_segs, total_size);
>>   ret = pvfs_bufmap_copy_to_user_iovec2(bufmap,
>>                                         &iter,
>>                                         buffer_index);
>>
>> pvfs2-bufmap.c:
>>
>> /*
>>  * Iterate through the array of pages containing the bytes from
>>  * a file being read.
>>  *
>>  */
>> int pvfs_bufmap_copy_to_user_iovec2(struct pvfs2_bufmap *bufmap,
>>                                     struct iov_iter *iter,
>>                                     int buffer_index)
>> {
>>         struct pvfs_bufmap_desc *from;
>>         struct page *page;
>>         int i;
>>
>>         from = &bufmap->desc_array[buffer_index];
>>
>>         for (i = 0; iov_iter_count(iter); i++) {
>>                 page = from->page_array[i];
>>                 copy_page_to_iter(page, 0, PAGE_SIZE, iter);
>>         }
>>
>>         return iov_iter_count(iter) ? -EFAULT : 0;
>> }
>>
>>
>> There's a pointer to an array of pages in the bufmap structure. Bufmap is
>> a structure that Christoph created. Bufmap is filled with related, but
>> previously free-floating, variables. Christoph glommed them together
>> into the new bufmap struct while making a fix for a lock inversion problem...
>>
>> This new version of pvfs_bufmap_copy_to_user_iovec peels off the pages
>> from the array and iterates through them with copy_page_to_iter. Unlike
>> the ripped off code, the for loop executes exactly the expected number
>> of times. If the file being read is smaller than a page, the loop is
>> executed once and file-size number of bytes are copied. If the file
>> being read is larger than a page, it may or may not all be in the page
>> array on a particular visit to pvfs_bufmap_copy_to_user_iovec2, but the
>> loop is executed the expected number of times based on iov_iter_count(iter),
>> peeling off a page at a time and then the correct remnant number of bytes.
>>
>> I spoke with Stephen Rothwall about how to go forward with the
>> orangefs tree I have in Linux Next... I plan to rebuild it based
>> on a 4.3 rc1 or rc2 tree, and it will have new stuff that gets
>> reviewed in it.
>>
>> If the above passes muster, it will be there... I'll ditch the
>> old open-coded pvfs_bufmap_copy_to_user_iovec and rename my
>> new pvfs_bufmap_copy_to_user_iovec2 to pvfs_bufmap_copy_to_user_iovec.
>>
>> For a while when working on pvfs_bufmap_copy_to_user_iovec2 I was
>> under the impression that copy_page_to_iter returned the number of
>> bytes *not* copied, based on what I incorrectly inferred from
>> https://lwn.net/Articles/625077, but I see that it indeed returns
>> the number of bytes copied. And iov_iter_count returns what's left to
>> copy. I don't even see a reason, in my case, to collect
>> copy_page_to_iter's return code - it is always PAGE_SIZE, or, on the
>> final time around, some remnant. I guess the "unlikely" return of 0 might
>> be considered an error return code? I guess checking to see that
>> copy_page_to_iter always returns either PAGE_SIZE or the same value as
>> iov_ter_count previously returned on the final remnant pass might
>> have some value in setting pvfs_bufmap_copy_to_user_iovec2's return code?
>> As it is, I just blast through and if iov_iter_count is 0 at the
>> end, its all good.
>>
>> Hopefully other things I send in for review will be less verbose <g>...
>>
>> -Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux