Re: [GIT PULL] Orangefs (text only resend)

Mike Marshall <hubcap@xxxxxxxxxxxx> · Sun, 6 Sep 2015 10:52:16 -0400

Al's "depth of knowledge" review is the kind I have been
yearning for.

+               /* Are we copying to User Virtual Addresses? */
+               if (to_user)
+                       ret = pvfs_bufmap_copy_to_user_iovec(
+                               bufmap,
+                               buffer_index,
+                               vec,
+                               nr_segs,
+                               total_size);
+               /* Are we copying to Kern Virtual Addresses? */
+               else
+                       ret = pvfs_bufmap_copy_to_kernel_iovec(
+                               bufmap,
+                               buffer_index,
+                               vec,
+                               nr_segs,
+                               total_size);

In a new branch of my private tree I changed out the very complex
 pvfs_bufmap_copy_to_user with:

int pvfs_bufmap_copy_to_user_iovec2(struct pvfs2_bufmap *bufmap,
                                                        struct iov_iter *iter,
                                                        size_t total_size,
                                                        int buffer_index)
{
        struct pvfs_bufmap_desc *from;
        int ret = 0; void *from_kaddr = NULL;
        from = &bufmap->desc_array[buffer_index];

        from_kaddr = pvfs2_kmap(from->page_array[0]);
        ret = copy_to_iter(from_kaddr, total_size, iter);
        return ret;
}

"cat filename" is a simple test that drives execution
down the "Are we copying to User Virtual Addresses?"
code path, and big and small files still copy
out.

I haven't found a test case that drives execution down
the "Are we copying to Kern Virtual Addresses?" code
path yet.

Anyhow, whether Al's (and anyone else's) concerns
are worked on before or after (or if) we get accepted
upstream, we are committed to working on them.

I worked for several months on debugfs and sysfs
(and some "future proofing"), so you can guess
that we would wish to be out from under the gun
of going upstream, but code has to go on merit.

-Mike

On Sun, Sep 6, 2015 at 5:08 AM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> On Sun, Sep 06, 2015 at 08:35:52AM +0200, Christoph Hellwig wrote:
>
>> On Wed, Sep 02, 2015 at 04:34:41PM -0700, Linus Torvalds wrote:
>> >    (a) the iovecs are walked manually (eg
>> > pvfs_bufmap_copy_to_user_iovec()). I would really want to see the code
>> > use the iov_iter infrastructure
>>
>> And that was before we had the full blown iov_iter code.  Note that
>> orangefs always does O_DIRECT-style I/O and doesn't go through the
>> page cache or does any other similar client side caching except for mmap,
>> so it will only use the low-level iov_iter helpers.
>
> FWIW, the thing that looks rather wrong there is playing fast and loose with
> kvec -> iovec -> kvec casts.  This kind of thing
> +               /* Are we copying to User Virtual Addresses? */
> +               if (to_user)
> +                       ret = pvfs_bufmap_copy_to_user_iovec(
> +                               bufmap,
> +                               buffer_index,
> +                               vec,
> +                               nr_segs,
> +                               total_size);
> +               /* Are we copying to Kern Virtual Addresses? */
> +               else
> +                       ret = pvfs_bufmap_copy_to_kernel_iovec(
> +                               bufmap,
> +                               buffer_index,
> +                               vec,
> +                               nr_segs,
> +                               total_size);
> is almost always a sign of really bad API.  "is that a kernel one?"
> kind of flags is a bloody bad idea.  So's playing wiht copying iovecs,
> modifying those copies, etc.
>
> We might end up with new primitives added out of that, but this kind of
> stuff definitely shouldn't be open-coded, especially that way.
>
> Some random notes:
>
> Could we please put
> #define PVFS_util_min(x1, x2) (((x1) > (x2)) ? (x2) : (x1))
> out of its misery?  It's pretty much the textbook example of the evils
> of reinventing the wheels - side effects, arithmetic promotions, etc.
>
> Another thing: why does server want to know about LOOKUP_FOLLOW, of all
> things?  Unless I'm misreading what ->lookup() is doing there...
>
> This
> + * pvfs2_link() is only implemented here to make sure that we return a
> + * reasonable error code (the kernel will return a misleading EPERM
> + * otherwise).  PVFS2 does not support hard links.
> + */
> +static int pvfs2_link(struct dentry *old_dentry,
> +                     struct inode *dir,
> +                     struct dentry *dentry)
> +{
> +       return -EOPNOTSUPP;
> +}
> is stupid.  Expected error value is not EOPNOTSUPP; pardon the bluntness,
> but your idea of what would be less misleading doesn't matter - what matters
> is what the _callers_ of link(2), mknod(2), etc. are expecting.  Which is to
> say, what does the userland code expect to get.  It's outright promised in
> POSIX, actually.
>
> symlink(2) - what happens to too long symlink bodies?  Silent truncation?
>
> +static inline void PVFS_khandle_to(const struct pvfs2_khandle *kh,
> +                                  void *p, int size)
> +{
> +       int i;
> +       unsigned char *c = p;
> +
> +       memset(p, 0, size);
> +
> +       for (i = 0; i < 16 && i < size; i++)
> +               c[i] = kh->u[i];
> +}
>
> Er...  If you are using memset(), why open-code memcpy()?  ->u is an array
> of unsigned char, so this loop is just a straight memcpy(); nothing fancy
> going on there...
>
> May I politely inquire about the reasons for DECLARE_ERRNO_MAPPING_AND_FN
> being in a header?  Or existing at all, while we are at it...  Ditto for
> the bit under the tail of that animal - DECLARE_ERRNO_MAPPING, that is.
>
> mask_blocked_signals()/unmask_blocked_signals(): Oleg might want to take
> a look at that one (and parties responsible might want to dive for cover,
> especially when he figures out what hides behind pvfs2_current_sigaction
> and pvfs2_current_signal_lock).
>
> +       /* fill in temporary structure passed to fill_sb method */
> +       mount_sb_info.data = data;
> +       mount_sb_info.root_khandle =
> +               new_op->downcall.resp.fs_mount.root_khandle;
> +       mount_sb_info.fs_id = new_op->downcall.resp.fs_mount.fs_id;
> +       mount_sb_info.id = new_op->downcall.resp.fs_mount.id;
> +
> +       /*
> +        * the mount_sb_info structure looks odd, but it's used because
> +        * the private sb info isn't allocated until we call
> +        * pvfs2_fill_sb, yet we have the info we need to fill it with
> +        * here.  so we store it temporarily and pass all of the info
> +        * to fill_sb where it's properly copied out
> +        */
> +       mnt_sb_d = mount_nodev(fst,
> +                              flags,
> +                              (void *)&mount_sb_info,
> +                              pvfs2_fill_sb);
>
> If you are fighting an interface, that might be because you are using the
> wrong one...  Consider the definition of mount_nodev():
> struct dentry *mount_nodev(struct file_system_type *fs_type,
>         int flags, void *data,
>         int (*fill_super)(struct super_block *, void *, int))
> {
>         int error;
>         struct super_block *s = sget(fs_type, NULL, set_anon_super, flags, NULL);
>
>         if (IS_ERR(s))
>                 return ERR_CAST(s);
>
>         error = fill_super(s, data, flags & MS_SILENT ? 1 : 0);
>         if (error) {
>                 deactivate_locked_super(s);
>                 return ERR_PTR(error);
>         }
>         s->s_flags |= MS_ACTIVE;
>         return dget(s->s_root);
> }
> and observe that everything it calls is exported.  It's a trivial convenience
> wrapper for sget(), and if it turns out to be inconvenient - just use sget()
> itself and be done with that.  No need to bother with callbacks, having
> that mount_sb_info thing, etc.
>
> Is pvfs2_remount() safe to call during pvfs2_unmount_sb() on the same
> filesystem?  Incidentally, what's protecting the list of superblocks
> while you are walking it and calling pvfs2_remount()?
>
> While we are at it, a lot of GFP_KERNEL allocations seem to be done
> with a bunch of locks (including global ones, such as request_mutex) held.
> Why won't that deadlock?
>
> +#define pvfs2_lock_inode(inode) spin_lock(&inode->i_lock)
> +#define pvfs2_unlock_inode(inode) spin_unlock(&inode->i_lock)
>
> Inhume with extreme prejudice.  ->i_bytes and ->i_blocks are the least
> of your worries if two threads hit copy_attributes_to_inode() on the
> same inode in parallel.  Which can happen, AFAICS...
>
> +       /*
> +        * PVFS2 cannot set size with a setattr operation.  Probably not likely
> +        * to be requested through the VFS, but just in case, don't worry about
> +        * ATTR_SIZE
> +        */
> What about truncate(2)?
>
> +       sb->s_fs_info =
> +               kmalloc(sizeof(struct pvfs2_sb_info_s), PVFS2_GFP_FLAGS);
> +       if (!PVFS2_SB(sb))
> +               return -ENOMEM;
> +       memset(sb->s_fs_info, 0, sizeof(struct pvfs2_sb_info_s));
> +       PVFS2_SB(sb)->sb = sb;
> kzalloc()?
>
> +       root_dentry = d_make_root(root);
> +       if (!root_dentry) {
> +               iput(root);
> +               return -ENOMEM;
> Double iput() here...
>
> Anyway, bedtime for me - it's 5am here.  I'll post more detailed one after
> I get some sleep...
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html