On Sun, May 29, 2016 at 02:21:03AM +0200, Adam Borowski wrote: > On Fri, May 27, 2016 at 09:59:22PM -0400, Zygo Blaxell wrote: > > I wonder if there is a risk of damaging files like this: > > > > open A O_RDWR > > > > open B O_RDONLY > > > > copy B to A > > > > do _not_ call fsync() here > > > > dedup(A, B) > > > > do _not_ call fsync() here either > > > > some time passes > > > > power fail > > > > If A's extent isn't flushed to disk, what happens to B? Does dedup imply > > fsync or data ordering such that A is written to disk before the extent > > ref in B is updated, or can the content of B change? If root is running > > dedup, we can assume that whoever authorized root code execution also made > > sure that this case never arises (e.g. by ensuring the dedup agent always > > calls fsync, or otherwise ensuring that the extent at A is stable on disk > > before calling dedup). If we allow non-root to do this then we have no > > such assurance--but on the other hand maybe the assurance is weak and > > bad things can happen here already even if we require root privilege. > > I don't think this can happen on btrfs: the superblock is updated only after > a barrier when both the data and extent refs are already on the disk. If and only if the filesystem is mounted with the flushoncommit option, that's true. This is not the default, though, and I lost a fair amount of time and data before I discovered this. > > I also wonder what happens if an executable that has called > > mlockall(MCL_FUTURE) gets its pages replaced by a deduped extent while > > it's running...do the replaced pages become un-mlockall-ed? Is the VFS > > smart enough to swap in the new pages? Am I just silly and insane for > > expecting shared mmap() to have meaningful real-time properties on btrfs? > > mmap() has currently no relation to extents, even entirely identical files > don't share any pages: > > dd if=/dev/urandom of=a bs=1048576 count=1024 # 1GB of junk > cp --reflink a b > time cat a >/dev/null # all in page cache -- fast > time cat b >/dev/null # every extent is shared with a, but... That's precisely my point. If we've called mlockall(MCL_FUTURE), the executable pages are locked into RAM. If we then replace the underlying physical pages, this would normally invalidate the page cache for the replaced extent; however mlockall(MCL_FUTURE) and the exec() DENYWRITE lock combined mean that such an invalidation should not be possible. So what happens? (now that I think of it, whatever happens would be the same thing that happens during device delete or balance, so it's probably OK, maybe) > In any case, this patch doesn't introduce any cases not already triggerable > by root. It allows non-root to trigger cases that previously could only be triggered by root.
Attachment:
signature.asc
Description: Digital signature