Re: [PATCH] btrfs,vfs: allow FILE_EXTENT_SAME on a file opened ro

Zygo Blaxell <ce3g8jdj@xxxxxxxxxxxxxxxxxxxxx> · Fri, 27 May 2016 21:59:22 -0400

On Thu, May 26, 2016 at 05:04:01PM -0700, Mark Fasheh wrote:
> On Fri, May 20, 2016 at 05:45:12AM +0200, Adam Borowski wrote:
> > (Only btrfs currently implements dedupe_file_range.)
> > 
> > Instead of checking the mode of the file descriptor, let's check whether
> > it could have been opened rw.  This allows fixing failures when deduping
> > a live system: anyone trying to exec a file currently being deduped gets
> > ETXTBSY.

Isn't there another copy of this check in fs/btrfs/ioctl.c?

> > Issuing this ioctl on a ro file was already allowed for root/cap.
> > 
> > Signed-off-by: Adam Borowski <kilobyte@xxxxxxxxxx>
> 
> Hi Adam, this patch seems reasonable to me but I have to admit to being
> worried about 'unintended consequences'. I poked around the code in fs/ for
> a bit and saw mostly checks against file open mode. It might be that dedupe
> is a special case due to the potential for longer running operations, 

The length of the operation is irrelevant.

If dedup is run over frequently executed files (e.g. /bin/sh), opening
the files in write mode will randomly disrupt some normal execution no
matter how quickly the dedup agent opens and closes the files.

The interference works both ways, too:  if a file is already being
executed, it can't be opened with write access.   Executables often
run for extended periods of time, effectively preventing them from ever
being deduped by non-root users.

Executables are often really good candidates for dedup, so we don't want
to avoid them.  Executable files on build servers are often duplicated
several times over in various .o and .a files and install/packaging trees.

> theoretically you'd see the same problem if trying to exec against a file
> being cloned too, correct? If that's the case then I wonder how this issue
> gets solved for other ioctls.

Clone could be used to change the content of the destination file, so
it is right to insist on having the file opened for writing in that case.

Dedupe is an unusual case because it will never change the content of
the destination file even if it is opened *with* write access.  The
defrag ioctl is similar in this respect.

Opening either dedup or defrag to non-root O_RDONLY access does have
some concerns.  It could be used to create a lot of extra I/O load,
particularly write load, that might otherwise be denied to an unprivileged
user...but then, so can repeatedly calling sync(), so maybe this isn't
a real problem.

There is a potential risk of exposing bugs in other parts of the system
(e.g. the various recently fixed dedup races) but whatever those bugs are,
we had them already without changing the permission checks.

I wonder if there is a risk of damaging files like this:

	open A O_RDWR

	open B O_RDONLY

	copy B to A

	do _not_ call fsync() here

	dedup(A, B)

	do _not_ call fsync() here either

	some time passes

	power fail

If A's extent isn't flushed to disk, what happens to B?  Does dedup imply
fsync or data ordering such that A is written to disk before the extent
ref in B is updated, or can the content of B change?  If root is running
dedup, we can assume that whoever authorized root code execution also made
sure that this case never arises (e.g. by ensuring the dedup agent always
calls fsync, or otherwise ensuring that the extent at A is stable on disk
before calling dedup).  If we allow non-root to do this then we have no
such assurance--but on the other hand maybe the assurance is weak and
bad things can happen here already even if we require root privilege.

I also wonder what happens if an executable that has called
mlockall(MCL_FUTURE) gets its pages replaced by a deduped extent while
it's running...do the replaced pages become un-mlockall-ed?  Is the VFS
smart enough to swap in the new pages?  Am I just silly and insane for
expecting shared mmap() to have meaningful real-time properties on btrfs?

Attachment:
signature.asc

Description: Digital signature