Re: Copy tools on Linux

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Jun 30, 2018 at 8:13 AM Goldwyn Rodrigues <rgoldwyn@xxxxxxx> wrote:
>
> Hi Steve,
>
> On 06-29 21:37, Steve French wrote:
> > I have been looking at i/o patterns from various copy tools on Linux,
> > and it is pretty discouraging - I am hoping that I am forgetting an
> > important one that someone can point me to ...
> >
> > Some general problems:
> > 1) if source and target on the same file system it would be nice to
> > call the copy_file_range syscall (AFAIK only test tools call that),
> > although in some cases at least cp can do it for --reflink
>
> I have submitted a patch set for copy_file_range() across filesystems
> which can atleast use splice() [1] as a part of enabling holes in
> copy_file_range(), but it has not been incorporated so far.

Do you have a link to the patch?

> > 1) options for large i/o sizes (network latencies in network/cluster
> > fs can be large, so prefer larger 1M or 8M in some cases I/Os)
>
> Unfortunately tools derive I/O size from stat.st_blksize which may be
> pretty small for performing "efficient" I/O. However, the tools such as
> cp also determine series of zeros to convert into holes. So for that
> reason it works well. OTOH, that is not the most common case of tools,
> which I agree could be made faster.

dd is nice in that you can set i/o size (as can rsync) but seems sane to
allow rsize/wsize to be configurable

> > 2) parallelizing writes so not just one write in flight at a time
>
> What would the resultant file be in case of errors? Should the
> destination file be considered partially copied? man cp does not cover
> the case errors but currently it is assumed the file is partially copied
> and correct until the point of error.

Whether parallel i/o on one file, or multiple files, either will be
a huge help.  Just did a quick google search on the topic and it
pointed to a sysadmin article discussing one of the more
common copy tools on Windows, robocopy:

"Perhaps the most important switch to pay attention is /MT, which is a
feature that enables Robocopy to copy files in multi-threaded mode...
with multi-threaded enabled, you can copy multiple files at the same
time better utilizing the bandwidth and significantly speeding up the
process. If you don’t set a number when using the /MT switch, then the
default number will be 8, which means that Robocopy will try to copy
eight files at the same time. However, Robocopy supports 1 to 128
threads."

This seems sane - even if cp can't do it, having a tool that can
reasonably get at least four i/o in flight (perhaps for different
files, with only one i/o per file) would be huge help.

<snip>
> > 4) option to set the file size first, and then fill in writes (so
> > non-extending writes)
>
> File size or file allocation? How would you determine what file
> size to set? Consider the case the source file is sparse. It can be
> calculated, but needs more thought.

The goal here is to allow a copy option (as rsync does)
for target file systems where metadata sync is expensive
or expensive locking needed for setting end-of-file, set the filesize
early so it doesn't get reset 100s of times on extending writes

-- 
Thanks,

Steve




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux