Re: [GIT PULL] Orangefs (text only resend)

Mike Marshall <hubcap@xxxxxxxxxxxx> · Sun, 6 Sep 2015 11:00:47 -0400

Oh... and I also added a iov_iter_init before
calling the new version of pvfs_bufmap_copy_to_user...

if (to_user) {
   iov_iter_init(&iter, READ, vec, nr_segs, total_size);
   ret = pvfs_bufmap_copy_to_user_iovec2(bufmap,
                                                              &iter,
                                                              total_size,
                                                              buffer_index);
   ret = 0;
}

This is just me trying to figure out what to do,
not finished code...

-Mike

On Sun, Sep 6, 2015 at 10:52 AM, Mike Marshall <hubcap@xxxxxxxxxxxx> wrote:
> Al's "depth of knowledge" review is the kind I have been
> yearning for.
>
> +               /* Are we copying to User Virtual Addresses? */
> +               if (to_user)
> +                       ret = pvfs_bufmap_copy_to_user_iovec(
> +                               bufmap,
> +                               buffer_index,
> +                               vec,
> +                               nr_segs,
> +                               total_size);
> +               /* Are we copying to Kern Virtual Addresses? */
> +               else
> +                       ret = pvfs_bufmap_copy_to_kernel_iovec(
> +                               bufmap,
> +                               buffer_index,
> +                               vec,
> +                               nr_segs,
> +                               total_size);
>
> In a new branch of my private tree I changed out the very complex
>  pvfs_bufmap_copy_to_user with:
>
> int pvfs_bufmap_copy_to_user_iovec2(struct pvfs2_bufmap *bufmap,
>                                                         struct iov_iter *iter,
>                                                         size_t total_size,
>                                                         int buffer_index)
> {
>         struct pvfs_bufmap_desc *from;
>         int ret = 0; void *from_kaddr = NULL;
>         from = &bufmap->desc_array[buffer_index];
>
>         from_kaddr = pvfs2_kmap(from->page_array[0]);
>         ret = copy_to_iter(from_kaddr, total_size, iter);
>         return ret;
> }
>
> "cat filename" is a simple test that drives execution
> down the "Are we copying to User Virtual Addresses?"
> code path, and big and small files still copy
> out.
>
> I haven't found a test case that drives execution down
> the "Are we copying to Kern Virtual Addresses?" code
> path yet.
>
> Anyhow, whether Al's (and anyone else's) concerns
> are worked on before or after (or if) we get accepted
> upstream, we are committed to working on them.
>
> I worked for several months on debugfs and sysfs
> (and some "future proofing"), so you can guess
> that we would wish to be out from under the gun
> of going upstream, but code has to go on merit.
>
> -Mike
>
> On Sun, Sep 6, 2015 at 5:08 AM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
>> On Sun, Sep 06, 2015 at 08:35:52AM +0200, Christoph Hellwig wrote:
>>
>>> On Wed, Sep 02, 2015 at 04:34:41PM -0700, Linus Torvalds wrote:
>>> >    (a) the iovecs are walked manually (eg
>>> > pvfs_bufmap_copy_to_user_iovec()). I would really want to see the code
>>> > use the iov_iter infrastructure
>>>
>>> And that was before we had the full blown iov_iter code.  Note that
>>> orangefs always does O_DIRECT-style I/O and doesn't go through the
>>> page cache or does any other similar client side caching except for mmap,
>>> so it will only use the low-level iov_iter helpers.
>>
>> FWIW, the thing that looks rather wrong there is playing fast and loose with
>> kvec -> iovec -> kvec casts.  This kind of thing
>> +               /* Are we copying to User Virtual Addresses? */
>> +               if (to_user)
>> +                       ret = pvfs_bufmap_copy_to_user_iovec(
>> +                               bufmap,
>> +                               buffer_index,
>> +                               vec,
>> +                               nr_segs,
>> +                               total_size);
>> +               /* Are we copying to Kern Virtual Addresses? */
>> +               else
>> +                       ret = pvfs_bufmap_copy_to_kernel_iovec(
>> +                               bufmap,
>> +                               buffer_index,
>> +                               vec,
>> +                               nr_segs,
>> +                               total_size);
>> is almost always a sign of really bad API.  "is that a kernel one?"
>> kind of flags is a bloody bad idea.  So's playing wiht copying iovecs,
>> modifying those copies, etc.
>>
>> We might end up with new primitives added out of that, but this kind of
>> stuff definitely shouldn't be open-coded, especially that way.
>>
>> Some random notes:
>>
>> Could we please put
>> #define PVFS_util_min(x1, x2) (((x1) > (x2)) ? (x2) : (x1))
>> out of its misery?  It's pretty much the textbook example of the evils
>> of reinventing the wheels - side effects, arithmetic promotions, etc.
>>
>> Another thing: why does server want to know about LOOKUP_FOLLOW, of all
>> things?  Unless I'm misreading what ->lookup() is doing there...
>>
>> This
>> + * pvfs2_link() is only implemented here to make sure that we return a
>> + * reasonable error code (the kernel will return a misleading EPERM
>> + * otherwise).  PVFS2 does not support hard links.
>> + */
>> +static int pvfs2_link(struct dentry *old_dentry,
>> +                     struct inode *dir,
>> +                     struct dentry *dentry)
>> +{
>> +       return -EOPNOTSUPP;
>> +}
>> is stupid.  Expected error value is not EOPNOTSUPP; pardon the bluntness,
>> but your idea of what would be less misleading doesn't matter - what matters
>> is what the _callers_ of link(2), mknod(2), etc. are expecting.  Which is to
>> say, what does the userland code expect to get.  It's outright promised in
>> POSIX, actually.
>>
>> symlink(2) - what happens to too long symlink bodies?  Silent truncation?
>>
>> +static inline void PVFS_khandle_to(const struct pvfs2_khandle *kh,
>> +                                  void *p, int size)
>> +{
>> +       int i;
>> +       unsigned char *c = p;
>> +
>> +       memset(p, 0, size);
>> +
>> +       for (i = 0; i < 16 && i < size; i++)
>> +               c[i] = kh->u[i];
>> +}
>>
>> Er...  If you are using memset(), why open-code memcpy()?  ->u is an array
>> of unsigned char, so this loop is just a straight memcpy(); nothing fancy
>> going on there...
>>
>> May I politely inquire about the reasons for DECLARE_ERRNO_MAPPING_AND_FN
>> being in a header?  Or existing at all, while we are at it...  Ditto for
>> the bit under the tail of that animal - DECLARE_ERRNO_MAPPING, that is.
>>
>> mask_blocked_signals()/unmask_blocked_signals(): Oleg might want to take
>> a look at that one (and parties responsible might want to dive for cover,
>> especially when he figures out what hides behind pvfs2_current_sigaction
>> and pvfs2_current_signal_lock).
>>
>> +       /* fill in temporary structure passed to fill_sb method */
>> +       mount_sb_info.data = data;
>> +       mount_sb_info.root_khandle =
>> +               new_op->downcall.resp.fs_mount.root_khandle;
>> +       mount_sb_info.fs_id = new_op->downcall.resp.fs_mount.fs_id;
>> +       mount_sb_info.id = new_op->downcall.resp.fs_mount.id;
>> +
>> +       /*
>> +        * the mount_sb_info structure looks odd, but it's used because
>> +        * the private sb info isn't allocated until we call
>> +        * pvfs2_fill_sb, yet we have the info we need to fill it with
>> +        * here.  so we store it temporarily and pass all of the info
>> +        * to fill_sb where it's properly copied out
>> +        */
>> +       mnt_sb_d = mount_nodev(fst,
>> +                              flags,
>> +                              (void *)&mount_sb_info,
>> +                              pvfs2_fill_sb);
>>
>> If you are fighting an interface, that might be because you are using the
>> wrong one...  Consider the definition of mount_nodev():
>> struct dentry *mount_nodev(struct file_system_type *fs_type,
>>         int flags, void *data,
>>         int (*fill_super)(struct super_block *, void *, int))
>> {
>>         int error;
>>         struct super_block *s = sget(fs_type, NULL, set_anon_super, flags, NULL);
>>
>>         if (IS_ERR(s))
>>                 return ERR_CAST(s);
>>
>>         error = fill_super(s, data, flags & MS_SILENT ? 1 : 0);
>>         if (error) {
>>                 deactivate_locked_super(s);
>>                 return ERR_PTR(error);
>>         }
>>         s->s_flags |= MS_ACTIVE;
>>         return dget(s->s_root);
>> }
>> and observe that everything it calls is exported.  It's a trivial convenience
>> wrapper for sget(), and if it turns out to be inconvenient - just use sget()
>> itself and be done with that.  No need to bother with callbacks, having
>> that mount_sb_info thing, etc.
>>
>> Is pvfs2_remount() safe to call during pvfs2_unmount_sb() on the same
>> filesystem?  Incidentally, what's protecting the list of superblocks
>> while you are walking it and calling pvfs2_remount()?
>>
>> While we are at it, a lot of GFP_KERNEL allocations seem to be done
>> with a bunch of locks (including global ones, such as request_mutex) held.
>> Why won't that deadlock?
>>
>> +#define pvfs2_lock_inode(inode) spin_lock(&inode->i_lock)
>> +#define pvfs2_unlock_inode(inode) spin_unlock(&inode->i_lock)
>>
>> Inhume with extreme prejudice.  ->i_bytes and ->i_blocks are the least
>> of your worries if two threads hit copy_attributes_to_inode() on the
>> same inode in parallel.  Which can happen, AFAICS...
>>
>> +       /*
>> +        * PVFS2 cannot set size with a setattr operation.  Probably not likely
>> +        * to be requested through the VFS, but just in case, don't worry about
>> +        * ATTR_SIZE
>> +        */
>> What about truncate(2)?
>>
>> +       sb->s_fs_info =
>> +               kmalloc(sizeof(struct pvfs2_sb_info_s), PVFS2_GFP_FLAGS);
>> +       if (!PVFS2_SB(sb))
>> +               return -ENOMEM;
>> +       memset(sb->s_fs_info, 0, sizeof(struct pvfs2_sb_info_s));
>> +       PVFS2_SB(sb)->sb = sb;
>> kzalloc()?
>>
>> +       root_dentry = d_make_root(root);
>> +       if (!root_dentry) {
>> +               iput(root);
>> +               return -ENOMEM;
>> Double iput() here...
>>
>> Anyway, bedtime for me - it's 5am here.  I'll post more detailed one after
>> I get some sleep...
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html