On Tue, Jul 12, 2022 at 1:04 PM Josef Bacik <josef@xxxxxxxxxxxxxx> wrote: > > Yeah I'm fine with removing the inode_permission(), I just want to make sure > we're consistent across all of our IO related operations. It looks like at the > very least we're getting security_*_permission on things like > read/write/fallocate, so perhaps switch to that here and call it good enough? remap_verify_area() already does that, afaik. The more I look at the remap_range stuff, the more I feel it is very ad-hoc and nobody really thought deeply about it. What about an append-only destination? Is that kind of write supposed to be ok because it's just REMAP_FILE_DEDUP? The open side should already have checked for IS_IMMUTABLE, but O_APPEND is a thing? I'm getting the feeling that somebody really needs to think about what the semantics should be. In the meantime, I think that requiring the block size alignment is the important part here. The "check read permissions" is kind of a non-issue, since we already have that mmap() case. Strangely, it *does* check that the position is aligned for remapping in .generic_remap_checks(). And not at all for dedupe. And even for remapping, the size alignment is a bit strange. It takes the "source EOF" into account, but what if the destination file is big enough that that's not the end? And the inode_permission() check is wrong, but at least it does have the important check there (ie that FMODE_WRITE one). So doing the inode_permissions() check at worst just makes it fail too often, but since it's all a "optimistically dedupe" anyway, that kind of "fail in odd situations" doesn't really matter. So for allow_file_dedupe(), I'd suggest: (a) remove the inode_permission() check in allow_file_dedupe() (b) remove the uid_eq() check for the same reason (if you didn't open the destination for write, you have no business deduping anything, even if you're the owner) (c) add a "can't do it for APPEND_ONLY" (but let the CAP_SYS_ADMIN override it) AND I'd add a "make sure it's all block-aligned" check for both the source and each destination chunk. Something like the attached, IOW. Entirely untested, this is not meant to be applied as-is, this is meant to be the basis of discussion. Btw, the generic_remap_file_range_prep() IS_IMMUTABLE check seems bogus too. How could it possibly be immutable if we've opened the target for writing? So all of this seems a bit confused. It really smells like "filesystem people wrote this with low-level filesystem rules in mind, rather than any kind of high-level understanding or conceptual rules" Hmm? Linus
fs/remap_range.c | 27 +++++++++++++++++++-------- 1 file changed, 19 insertions(+), 8 deletions(-) diff --git a/fs/remap_range.c b/fs/remap_range.c index e112b5424cdb..ba71ceb8dde3 100644 --- a/fs/remap_range.c +++ b/fs/remap_range.c @@ -409,17 +409,12 @@ EXPORT_SYMBOL(vfs_clone_file_range); /* Check whether we are allowed to dedupe the destination file */ static bool allow_file_dedupe(struct file *file) { - struct user_namespace *mnt_userns = file_mnt_user_ns(file); - struct inode *inode = file_inode(file); - if (capable(CAP_SYS_ADMIN)) return true; + if (file->f_flags & O_APPEND) + return false; if (file->f_mode & FMODE_WRITE) return true; - if (uid_eq(current_fsuid(), i_uid_into_mnt(mnt_userns, inode))) - return true; - if (!inode_permission(mnt_userns, inode, MAY_WRITE)) - return true; return false; } @@ -428,6 +423,8 @@ loff_t vfs_dedupe_file_range_one(struct file *src_file, loff_t src_pos, loff_t len, unsigned int remap_flags) { loff_t ret; + struct inode *dst; + unsigned long blocksize; WARN_ON_ONCE(remap_flags & ~(REMAP_FILE_DEDUP | REMAP_FILE_CAN_SHORTEN)); @@ -457,10 +454,17 @@ loff_t vfs_dedupe_file_range_one(struct file *src_file, loff_t src_pos, goto out_drop_write; ret = -EISDIR; - if (S_ISDIR(file_inode(dst_file)->i_mode)) + dst = file_inode(dst_file); + if (S_ISDIR(dst->i_mode)) goto out_drop_write; ret = -EINVAL; + blocksize = 1ul << dst->i_blkbits; + if (dst_pos & (blocksize-1)) + goto out_drop_write; + if (len & (blocksize-1)) + goto out_drop_write; + if (!dst_file->f_op->remap_file_range) goto out_drop_write; @@ -488,6 +492,7 @@ int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same) int ret; u16 count = same->dest_count; loff_t deduped; + unsigned long blocksize; if (!(file->f_mode & FMODE_READ)) return -EINVAL; @@ -507,6 +512,12 @@ int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same) if (!file->f_op->remap_file_range) return -EOPNOTSUPP; + blocksize = 1ul << src->i_blkbits; + if (off & (blocksize-1)) + return -EINVAL; + if (len & (blocksize-1)) + return -EINVAL; + ret = remap_verify_area(file, off, len, false); if (ret < 0) return ret;