On 06-30 09:12, Steve French wrote: > On Sat, Jun 30, 2018 at 8:13 AM Goldwyn Rodrigues <rgoldwyn@xxxxxxx> wrote: > > > > Hi Steve, > > > > On 06-29 21:37, Steve French wrote: > > > I have been looking at i/o patterns from various copy tools on Linux, > > > and it is pretty discouraging - I am hoping that I am forgetting an > > > important one that someone can point me to ... > > > > > > Some general problems: > > > 1) if source and target on the same file system it would be nice to > > > call the copy_file_range syscall (AFAIK only test tools call that), > > > although in some cases at least cp can do it for --reflink > > > > I have submitted a patch set for copy_file_range() across filesystems > > which can atleast use splice() [1] as a part of enabling holes in > > copy_file_range(), but it has not been incorporated so far. > > Do you have a link to the patch? I posted it in the last mail, as a link to reference [1] https://www.spinics.net/lists/linux-fsdevel/msg128450.html > > > > 1) options for large i/o sizes (network latencies in network/cluster > > > fs can be large, so prefer larger 1M or 8M in some cases I/Os) > > > > Unfortunately tools derive I/O size from stat.st_blksize which may be > > pretty small for performing "efficient" I/O. However, the tools such as > > cp also determine series of zeros to convert into holes. So for that > > reason it works well. OTOH, that is not the most common case of tools, > > which I agree could be made faster. > > dd is nice in that you can set i/o size (as can rsync) but seems sane to > allow rsize/wsize to be configurable > > > > 2) parallelizing writes so not just one write in flight at a time > > > > What would the resultant file be in case of errors? Should the > > destination file be considered partially copied? man cp does not cover > > the case errors but currently it is assumed the file is partially copied > > and correct until the point of error. > > Whether parallel i/o on one file, or multiple files, either will be > a huge help. Just did a quick google search on the topic and it > pointed to a sysadmin article discussing one of the more > common copy tools on Windows, robocopy: > > "Perhaps the most important switch to pay attention is /MT, which is a > feature that enables Robocopy to copy files in multi-threaded mode... > with multi-threaded enabled, you can copy multiple files at the same > time better utilizing the bandwidth and significantly speeding up the > process. If you don’t set a number when using the /MT switch, then the > default number will be 8, which means that Robocopy will try to copy > eight files at the same time. However, Robocopy supports 1 to 128 > threads." > > This seems sane - even if cp can't do it, having a tool that can > reasonably get at least four i/o in flight (perhaps for different > files, with only one i/o per file) would be huge help. I see. Perhaps posting patches to cp would benefit the general folk. I will look at the coreutils code as well. > > <snip> > > > 4) option to set the file size first, and then fill in writes (so > > > non-extending writes) > > > > File size or file allocation? How would you determine what file > > size to set? Consider the case the source file is sparse. It can be > > calculated, but needs more thought. > > The goal here is to allow a copy option (as rsync does) > for target file systems where metadata sync is expensive > or expensive locking needed for setting end-of-file, set the filesize > early so it doesn't get reset 100s of times on extending writes > This assumes copy succeeds all the way. What happens in case of errors? A common way to check that file is successfully copied (though stupid) is compare file sizes. -- Goldwyn