Al, you've given me a lot more to look at than I can comment on intelligently in short order. Some of the things you've said are clear to me, and some I'll have to work to figure out... So, a couple of things: I have failed so far today to figure out what you mean by GrepTFS or GTFS... I know what you mean by STFU, so I hope its not something like that <g>... I have looked at all the usages of GFP_KERNEL, and locks, focusing on request_mutex... along with reading stuff like https://lwn.net/Articles/274695/ and https://lwn.net/lwn/kernel/LDD2/ch07.html... I don't yet see where we're doing GFP_KERNEL allocations within held locks... could you describe in more detail one of the ones you saw? -Mike On Mon, Sep 7, 2015 at 2:37 AM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > * symlink handling - why the hell do we set non-NULL cookie for > symlinks that do not have any ->put_link()? What's worse, it looks like > a bogus server could trick us into overwriting ->link_target of a live > inode (e.g. via ->d_revalidate() -> pvfs2_inode_getattr() -> > copy_attributes_to_inode()). Am I misreading it? > * in copy_attributes_to_inode() we have > case PVFS_TYPE_SYMLINK: > if (symname != NULL) { > inode->i_size = (loff_t) strlen(symname); > break; > } > /*FALLTHRU*/ > and the only caller passes is > if (copy_attributes_to_inode(inode, > &new_op->downcall.resp.getattr.attributes, > new_op->downcall.resp.getattr.link_target)) { > How can symname possibly be NULL? .resp.getattr.link_target is an array, > for crying out loud! > * who sets (or uses) struct PVFS_sys_attr_s .target_link? Or > .dist_name, or .dist_params, while we are at it... > * can a symlink be longer than 256 bytes? And why are > fs/orangefs/downcall.h:40: char link_target[PVFS2_NAME_LEN]; > and > fs/orangefs/pvfs2-kernel.h:317: char link_target[PVFS_NAME_MAX]; > spelled out differently? _Both_ constants are 256; what's the point of > obfuscating here? > * what's to guarantee that this .resp.getattr.link_target is > NUL-terminated? Passing it to strlen() in copy_attributes_to_inode() > is a bad idea unless we _have_ NUL in there... > * in the same copy_attributes_to_inode() you have > /* copy link target to inode private data */ > if (pvfs2_inode && symname) { > strncpy(pvfs2_inode->link_target, > symname, > PVFS_NAME_MAX); > gossip_debug(GOSSIP_UTILS_DEBUG, > "Copied attr link target %s\n", > pvfs2_inode->link_target); > } > Again, what's to guarantee that ->link_target in pvfs2_inode will be > NUL-terminated? > * handling of ->i_mode in the same function is completely broken - > you are building it in place *ON* *LIVE* *INODE*: > inode->i_mode = perm_mode; > ... > inode->i_mode |= S_IFDIR; > (and similar for regulars and symlinks). Again, that sucker is called from > your ->d_revalidate(). With no exclusion whatsoever, not that it would be > easy to provide one for all ->i_mode readers (it is possible, but you'd > have to replace a lot of generic helper functions with ones of your own; > not done and almost certainly not worth attempting). > * pvfs2_symlink() can't get NULL symname; what it *can* get is > symname up to 4Kb long. Doing > strncpy(new_op->upcall.req.sym.target, symname, PVFS2_NAME_LEN); > will not only quietly truncate it, it might leave the copy without NUL... > * what's up with compulsive casts? E.g. > attrs->mtime = > pvfs2_convert_time_field((void *)&iattr->ia_mtime); > with > __u64 pvfs2_convert_time_field(void *time_ptr) > { > __u64 pvfs2_time; > struct timespec *tspec = (struct timespec *)time_ptr; > > pvfs2_time = (__u64) ((time_t) tspec->tv_sec); > return pvfs2_time; > } > Leaving aside the casts in (__u64) ((time_t) tspec->tv_sec), ->ia_mtime is > struct timespec; what's the point of taking its address, casting to void *, > passing to that sucker, only to cast to back to struct timespec * (and > only reading from it, so const struct timespec * would do just fine)? > * wrappers must die. It's not IOCCC, damn it. While having > pvfs2_inode_lock and pvfs2_lock_inode (both bogus) in the same header has > a certain charm, please don't do it. > * in dir.c: > for (i = 0; i < rhandle.readdir_response.pvfs_dirent_outcount; i++) { > len = rhandle.readdir_response.dirent_array[i].d_length; > current_entry = rhandle.readdir_response.dirent_array[i].d_name; > current_ino = pvfs2_khandle_to_ino( > &(rhandle.readdir_response.dirent_array[i].khandle)); > > gossip_debug(GOSSIP_DIR_DEBUG, > "calling dir_emit for %s with len %d, pos %ld\n", > current_entry, > len, > (unsigned long)pos); > ret = > dir_emit(ctx, current_entry, len, current_ino, DT_UNKNOWN); > ctx->pos++; > gossip_ldebug(GOSSIP_DIR_DEBUG, > "%s: ctx->pos:%lld\n", > __func__, > lld(ctx->pos)); > > pos++; > } > > /* this means that all of the dir_emit calls succeeded */ > if (i == rhandle.readdir_response.pvfs_dirent_outcount) { > > No, it doesn't. You ignore the return value of dir_emit()... > > * same file, decode_dirents(): > readdir->dirent_array = kmalloc(readdir->pvfs_dirent_outcount * > sizeof(*readdir->dirent_array), > GFP_KERNEL); > What's to prevent a wraparound in multiplication here? > * same file, same function: > for (i = 0; i < readdir->pvfs_dirent_outcount; i++) { > dec_string(pptr, &readdir->dirent_array[i].d_name, > &readdir->dirent_array[i].d_length); > where > #define dec_string(pptr, pbuf, plen) do { \ > __u32 len = (*(__u32 *) *(pptr)); \ > *pbuf = *(pptr) + 4; \ > *(pptr) += roundup8(4 + len + 1); \ > if (plen) \ > *plen = len;\ > } while (0) > Just what will happen if this 32bit length will be well beyond > the size of response we are parsing here? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html