Re: Same mountpoint restriction in FICLONE ioctls

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 13, 2020 at 1:28 AM Keno Fischer <keno@xxxxxxxxxxxxxxxxxx> wrote:
>
> > You did not specify your use case.
>
> My use case is recording (https://rr-project.org/) executions

Cool! I should try that ;-)

> of containers (which often make heavy use of bind mounts on
> the same file system, thus me running into this restriction).
> In essence, at relevant read or mmap operations,
> rr needs to checkpoint the file that was opened,
> in case it later gets deleted or modified.
> It always tries to FICLONE the file first,
> before deciding heuristically whether to
> instead create a copy (if it decides there is a low
> likelihood the file will get changed - e.g. because
> it's a system file - it may decide to take the chance and
> not copy it at the risk of creating a broken recording).
> That's often a decent trade-off, but of course it's not
> 100% perfect.
>
> > The question is: do you *really* need cross mount clone?
> > Can you use copy_file_range() instead?
>
> Good question. copy_file_range doesn't quite work
> for that initial clone, because we do want it to fail if
> cloning doesn't work (so that we can apply the
> heuristics). However, you make a good point that
> the copy fallback should probably use copy_file_range.
> At least that way, if it does decide to copy, the
> performance will be better.
>
> It would still be nice for FICLONE to ease this restriction,
> since it reduces the chance of the heuristics getting
> it wrong and preventing the copy, even if such
> a copy would have been cheap.
>

You make it sound like the heuristic decision must be made
*after* trying to clone, but it can be made before and pass
flags to the kernel whether or to fallback to copy.

copy_file_range(2) has an unused flags argument.
Adding support for flags like:
COPY_FILE_RANGE_BY_FS
COPY_FILE_RANGE_BY_KERNEL

or any other names elected after bike shedding can be used
to control whether user intended to use filesystem internal
clone/copy methods and/or to fallback to kernel copy.

I think this functionality will be useful to many.

> > Across which filesystems mounts are you trying to clone?
>
> This functionality was written with btrfs in mind, so that's
> what I was testing with. The mounts themselves are just
> different bindmounts into the same filesystem.
>

I can also suggest a workaround for you.
If your only problem is bind mounts and if recorder is a privileged
process (CAP_DAC_READ_SEARCH) then you can use a "master"
bind mount to perform all clone operations on.
Use name_to_handle_at(2) to get sb file handle of source file.
Use open_by_handle_at(2) to get an open file descriptor of the source
file under the "master" bind mount.

Thanks,
Amir.



[Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux