Oh... and I also added a iov_iter_init before calling the new version of pvfs_bufmap_copy_to_user... if (to_user) { iov_iter_init(&iter, READ, vec, nr_segs, total_size); ret = pvfs_bufmap_copy_to_user_iovec2(bufmap, &iter, total_size, buffer_index); ret = 0; } This is just me trying to figure out what to do, not finished code... -Mike On Sun, Sep 6, 2015 at 10:52 AM, Mike Marshall <hubcap@xxxxxxxxxxxx> wrote: > Al's "depth of knowledge" review is the kind I have been > yearning for. > > + /* Are we copying to User Virtual Addresses? */ > + if (to_user) > + ret = pvfs_bufmap_copy_to_user_iovec( > + bufmap, > + buffer_index, > + vec, > + nr_segs, > + total_size); > + /* Are we copying to Kern Virtual Addresses? */ > + else > + ret = pvfs_bufmap_copy_to_kernel_iovec( > + bufmap, > + buffer_index, > + vec, > + nr_segs, > + total_size); > > In a new branch of my private tree I changed out the very complex > pvfs_bufmap_copy_to_user with: > > int pvfs_bufmap_copy_to_user_iovec2(struct pvfs2_bufmap *bufmap, > struct iov_iter *iter, > size_t total_size, > int buffer_index) > { > struct pvfs_bufmap_desc *from; > int ret = 0; void *from_kaddr = NULL; > from = &bufmap->desc_array[buffer_index]; > > from_kaddr = pvfs2_kmap(from->page_array[0]); > ret = copy_to_iter(from_kaddr, total_size, iter); > return ret; > } > > "cat filename" is a simple test that drives execution > down the "Are we copying to User Virtual Addresses?" > code path, and big and small files still copy > out. > > I haven't found a test case that drives execution down > the "Are we copying to Kern Virtual Addresses?" code > path yet. > > Anyhow, whether Al's (and anyone else's) concerns > are worked on before or after (or if) we get accepted > upstream, we are committed to working on them. > > I worked for several months on debugfs and sysfs > (and some "future proofing"), so you can guess > that we would wish to be out from under the gun > of going upstream, but code has to go on merit. > > -Mike > > On Sun, Sep 6, 2015 at 5:08 AM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: >> On Sun, Sep 06, 2015 at 08:35:52AM +0200, Christoph Hellwig wrote: >> >>> On Wed, Sep 02, 2015 at 04:34:41PM -0700, Linus Torvalds wrote: >>> > (a) the iovecs are walked manually (eg >>> > pvfs_bufmap_copy_to_user_iovec()). I would really want to see the code >>> > use the iov_iter infrastructure >>> >>> And that was before we had the full blown iov_iter code. Note that >>> orangefs always does O_DIRECT-style I/O and doesn't go through the >>> page cache or does any other similar client side caching except for mmap, >>> so it will only use the low-level iov_iter helpers. >> >> FWIW, the thing that looks rather wrong there is playing fast and loose with >> kvec -> iovec -> kvec casts. This kind of thing >> + /* Are we copying to User Virtual Addresses? */ >> + if (to_user) >> + ret = pvfs_bufmap_copy_to_user_iovec( >> + bufmap, >> + buffer_index, >> + vec, >> + nr_segs, >> + total_size); >> + /* Are we copying to Kern Virtual Addresses? */ >> + else >> + ret = pvfs_bufmap_copy_to_kernel_iovec( >> + bufmap, >> + buffer_index, >> + vec, >> + nr_segs, >> + total_size); >> is almost always a sign of really bad API. "is that a kernel one?" >> kind of flags is a bloody bad idea. So's playing wiht copying iovecs, >> modifying those copies, etc. >> >> We might end up with new primitives added out of that, but this kind of >> stuff definitely shouldn't be open-coded, especially that way. >> >> Some random notes: >> >> Could we please put >> #define PVFS_util_min(x1, x2) (((x1) > (x2)) ? (x2) : (x1)) >> out of its misery? It's pretty much the textbook example of the evils >> of reinventing the wheels - side effects, arithmetic promotions, etc. >> >> Another thing: why does server want to know about LOOKUP_FOLLOW, of all >> things? Unless I'm misreading what ->lookup() is doing there... >> >> This >> + * pvfs2_link() is only implemented here to make sure that we return a >> + * reasonable error code (the kernel will return a misleading EPERM >> + * otherwise). PVFS2 does not support hard links. >> + */ >> +static int pvfs2_link(struct dentry *old_dentry, >> + struct inode *dir, >> + struct dentry *dentry) >> +{ >> + return -EOPNOTSUPP; >> +} >> is stupid. Expected error value is not EOPNOTSUPP; pardon the bluntness, >> but your idea of what would be less misleading doesn't matter - what matters >> is what the _callers_ of link(2), mknod(2), etc. are expecting. Which is to >> say, what does the userland code expect to get. It's outright promised in >> POSIX, actually. >> >> symlink(2) - what happens to too long symlink bodies? Silent truncation? >> >> +static inline void PVFS_khandle_to(const struct pvfs2_khandle *kh, >> + void *p, int size) >> +{ >> + int i; >> + unsigned char *c = p; >> + >> + memset(p, 0, size); >> + >> + for (i = 0; i < 16 && i < size; i++) >> + c[i] = kh->u[i]; >> +} >> >> Er... If you are using memset(), why open-code memcpy()? ->u is an array >> of unsigned char, so this loop is just a straight memcpy(); nothing fancy >> going on there... >> >> May I politely inquire about the reasons for DECLARE_ERRNO_MAPPING_AND_FN >> being in a header? Or existing at all, while we are at it... Ditto for >> the bit under the tail of that animal - DECLARE_ERRNO_MAPPING, that is. >> >> mask_blocked_signals()/unmask_blocked_signals(): Oleg might want to take >> a look at that one (and parties responsible might want to dive for cover, >> especially when he figures out what hides behind pvfs2_current_sigaction >> and pvfs2_current_signal_lock). >> >> + /* fill in temporary structure passed to fill_sb method */ >> + mount_sb_info.data = data; >> + mount_sb_info.root_khandle = >> + new_op->downcall.resp.fs_mount.root_khandle; >> + mount_sb_info.fs_id = new_op->downcall.resp.fs_mount.fs_id; >> + mount_sb_info.id = new_op->downcall.resp.fs_mount.id; >> + >> + /* >> + * the mount_sb_info structure looks odd, but it's used because >> + * the private sb info isn't allocated until we call >> + * pvfs2_fill_sb, yet we have the info we need to fill it with >> + * here. so we store it temporarily and pass all of the info >> + * to fill_sb where it's properly copied out >> + */ >> + mnt_sb_d = mount_nodev(fst, >> + flags, >> + (void *)&mount_sb_info, >> + pvfs2_fill_sb); >> >> If you are fighting an interface, that might be because you are using the >> wrong one... Consider the definition of mount_nodev(): >> struct dentry *mount_nodev(struct file_system_type *fs_type, >> int flags, void *data, >> int (*fill_super)(struct super_block *, void *, int)) >> { >> int error; >> struct super_block *s = sget(fs_type, NULL, set_anon_super, flags, NULL); >> >> if (IS_ERR(s)) >> return ERR_CAST(s); >> >> error = fill_super(s, data, flags & MS_SILENT ? 1 : 0); >> if (error) { >> deactivate_locked_super(s); >> return ERR_PTR(error); >> } >> s->s_flags |= MS_ACTIVE; >> return dget(s->s_root); >> } >> and observe that everything it calls is exported. It's a trivial convenience >> wrapper for sget(), and if it turns out to be inconvenient - just use sget() >> itself and be done with that. No need to bother with callbacks, having >> that mount_sb_info thing, etc. >> >> Is pvfs2_remount() safe to call during pvfs2_unmount_sb() on the same >> filesystem? Incidentally, what's protecting the list of superblocks >> while you are walking it and calling pvfs2_remount()? >> >> While we are at it, a lot of GFP_KERNEL allocations seem to be done >> with a bunch of locks (including global ones, such as request_mutex) held. >> Why won't that deadlock? >> >> +#define pvfs2_lock_inode(inode) spin_lock(&inode->i_lock) >> +#define pvfs2_unlock_inode(inode) spin_unlock(&inode->i_lock) >> >> Inhume with extreme prejudice. ->i_bytes and ->i_blocks are the least >> of your worries if two threads hit copy_attributes_to_inode() on the >> same inode in parallel. Which can happen, AFAICS... >> >> + /* >> + * PVFS2 cannot set size with a setattr operation. Probably not likely >> + * to be requested through the VFS, but just in case, don't worry about >> + * ATTR_SIZE >> + */ >> What about truncate(2)? >> >> + sb->s_fs_info = >> + kmalloc(sizeof(struct pvfs2_sb_info_s), PVFS2_GFP_FLAGS); >> + if (!PVFS2_SB(sb)) >> + return -ENOMEM; >> + memset(sb->s_fs_info, 0, sizeof(struct pvfs2_sb_info_s)); >> + PVFS2_SB(sb)->sb = sb; >> kzalloc()? >> >> + root_dentry = d_make_root(root); >> + if (!root_dentry) { >> + iput(root); >> + return -ENOMEM; >> Double iput() here... >> >> Anyway, bedtime for me - it's 5am here. I'll post more detailed one after >> I get some sleep... -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html