On Feb 8, 2019, at 8:19 AM, Steve French <smfrench@xxxxxxxxx> wrote: > > Current Linux copy tools have various problems compared to other > platforms - small I/O sizes (and not even configurable for most), Hmm, this comment puzzles me, since "cp" already uses s_blksize returned for the file as the IO size? Not sure if tar/rsync do the same, but if they don't already use s_blksize they should. > lack of parallel I/O for multi-file copies, inability to reduce metadata > updates by setting file size first, lack of cross mount (to the same > file system) copy optimizations, limited ability to handle the wide > variety of server side copy (and copy offload) mechanisms and various > error handling problems. And copy tools rely less on the kernel file > system (vs. code in the user space tool) in Linux than would be > expected, in order to determine which optimizations to use. The rest of these issues are definitely a concern. It is worthwhile to point out MPIFileUtils (https://github.com/hpc/mpifileutils) that already solves a lot of these problems. As the name suggests, it currently uses MPI to run in parallel across multiple nodes, but it should be possible to add a wrapper for the MPI calls in the library with fork()+exec() or so and run multi-threaded on one node for parallel copy/find/sync/etc. IMHO, it makes sense to try and optimize a single set of tools, rather than adding yet another set of tools that are not widely used. There is also "mutils" (https://github.com/pkolano/mutil) which are patches for GNU cp and md5sum, but they are less widely used vs. MPIFileUtils. That said, most users are going to have GNU Fileutils installed, so the best option is to add improvements directly into those tools if possible, with the caveat that you will get a headache reading that code, and they may object to including parallel extensions due to portability concerns. > Would like to discuss some of the observations about copy tools and > how we can move forward on improving the performance of common copy > operations. Unfortunately, I'm unable to attend LSF/MM this year, or this would definitely be a topic of interest to me. Cheers, Andreas
Attachment:
signature.asc
Description: Message signed with OpenPGP