On Sat, Jun 30, 2018 at 8:13 AM Goldwyn Rodrigues <rgoldwyn@xxxxxxx> wrote: > > Hi Steve, > > On 06-29 21:37, Steve French wrote: > > I have been looking at i/o patterns from various copy tools on Linux, > > and it is pretty discouraging - I am hoping that I am forgetting an > > important one that someone can point me to ... > > > > Some general problems: > > 1) if source and target on the same file system it would be nice to > > call the copy_file_range syscall (AFAIK only test tools call that), > > although in some cases at least cp can do it for --reflink > > I have submitted a patch set for copy_file_range() across filesystems > which can atleast use splice() [1] as a part of enabling holes in > copy_file_range(), but it has not been incorporated so far. Do you have a link to the patch? > > 1) options for large i/o sizes (network latencies in network/cluster > > fs can be large, so prefer larger 1M or 8M in some cases I/Os) > > Unfortunately tools derive I/O size from stat.st_blksize which may be > pretty small for performing "efficient" I/O. However, the tools such as > cp also determine series of zeros to convert into holes. So for that > reason it works well. OTOH, that is not the most common case of tools, > which I agree could be made faster. dd is nice in that you can set i/o size (as can rsync) but seems sane to allow rsize/wsize to be configurable > > 2) parallelizing writes so not just one write in flight at a time > > What would the resultant file be in case of errors? Should the > destination file be considered partially copied? man cp does not cover > the case errors but currently it is assumed the file is partially copied > and correct until the point of error. Whether parallel i/o on one file, or multiple files, either will be a huge help. Just did a quick google search on the topic and it pointed to a sysadmin article discussing one of the more common copy tools on Windows, robocopy: "Perhaps the most important switch to pay attention is /MT, which is a feature that enables Robocopy to copy files in multi-threaded mode... with multi-threaded enabled, you can copy multiple files at the same time better utilizing the bandwidth and significantly speeding up the process. If you don’t set a number when using the /MT switch, then the default number will be 8, which means that Robocopy will try to copy eight files at the same time. However, Robocopy supports 1 to 128 threads." This seems sane - even if cp can't do it, having a tool that can reasonably get at least four i/o in flight (perhaps for different files, with only one i/o per file) would be huge help. <snip> > > 4) option to set the file size first, and then fill in writes (so > > non-extending writes) > > File size or file allocation? How would you determine what file > size to set? Consider the case the source file is sparse. It can be > calculated, but needs more thought. The goal here is to allow a copy option (as rsync does) for target file systems where metadata sync is expensive or expensive locking needed for setting end-of-file, set the filesize early so it doesn't get reset 100s of times on extending writes -- Thanks, Steve