Re: Copy tools on Linux

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 06-30 09:12, Steve French wrote:
> On Sat, Jun 30, 2018 at 8:13 AM Goldwyn Rodrigues <rgoldwyn@xxxxxxx> wrote:
> >
> > Hi Steve,
> >
> > On 06-29 21:37, Steve French wrote:
> > > I have been looking at i/o patterns from various copy tools on Linux,
> > > and it is pretty discouraging - I am hoping that I am forgetting an
> > > important one that someone can point me to ...
> > >
> > > Some general problems:
> > > 1) if source and target on the same file system it would be nice to
> > > call the copy_file_range syscall (AFAIK only test tools call that),
> > > although in some cases at least cp can do it for --reflink
> >
> > I have submitted a patch set for copy_file_range() across filesystems
> > which can atleast use splice() [1] as a part of enabling holes in
> > copy_file_range(), but it has not been incorporated so far.
> 
> Do you have a link to the patch?

I posted it in the last mail, as a link to reference [1]
https://www.spinics.net/lists/linux-fsdevel/msg128450.html

> 
> > > 1) options for large i/o sizes (network latencies in network/cluster
> > > fs can be large, so prefer larger 1M or 8M in some cases I/Os)
> >
> > Unfortunately tools derive I/O size from stat.st_blksize which may be
> > pretty small for performing "efficient" I/O. However, the tools such as
> > cp also determine series of zeros to convert into holes. So for that
> > reason it works well. OTOH, that is not the most common case of tools,
> > which I agree could be made faster.
> 
> dd is nice in that you can set i/o size (as can rsync) but seems sane to
> allow rsize/wsize to be configurable
> 
> > > 2) parallelizing writes so not just one write in flight at a time
> >
> > What would the resultant file be in case of errors? Should the
> > destination file be considered partially copied? man cp does not cover
> > the case errors but currently it is assumed the file is partially copied
> > and correct until the point of error.
> 
> Whether parallel i/o on one file, or multiple files, either will be
> a huge help.  Just did a quick google search on the topic and it
> pointed to a sysadmin article discussing one of the more
> common copy tools on Windows, robocopy:
> 
> "Perhaps the most important switch to pay attention is /MT, which is a
> feature that enables Robocopy to copy files in multi-threaded mode...
> with multi-threaded enabled, you can copy multiple files at the same
> time better utilizing the bandwidth and significantly speeding up the
> process. If you don’t set a number when using the /MT switch, then the
> default number will be 8, which means that Robocopy will try to copy
> eight files at the same time. However, Robocopy supports 1 to 128
> threads."
> 
> This seems sane - even if cp can't do it, having a tool that can
> reasonably get at least four i/o in flight (perhaps for different
> files, with only one i/o per file) would be huge help.

I see. Perhaps posting patches to cp would benefit the general folk.
I will look at the coreutils code as well.

> 
> <snip>
> > > 4) option to set the file size first, and then fill in writes (so
> > > non-extending writes)
> >
> > File size or file allocation? How would you determine what file
> > size to set? Consider the case the source file is sparse. It can be
> > calculated, but needs more thought.
> 
> The goal here is to allow a copy option (as rsync does)
> for target file systems where metadata sync is expensive
> or expensive locking needed for setting end-of-file, set the filesize
> early so it doesn't get reset 100s of times on extending writes
> 

This assumes copy succeeds all the way. What happens in case of errors?
A common way to check that file is successfully copied (though stupid)
is compare file sizes.

-- 
Goldwyn



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux