On Mon, Feb 11, 2019 at 2:32 AM Andreas Dilger <adilger@xxxxxxxxx> wrote: > > On Feb 8, 2019, at 4:56 PM, Steve French <smfrench@xxxxxxxxx> wrote: > > > > On Fri, Feb 8, 2019 at 5:03 PM Steve French <smfrench@xxxxxxxxx> wrote: > >> > >> On Fri, Feb 8, 2019 at 4:37 PM Andreas Dilger <adilger@xxxxxxxxx> wrote: > >>> > >>> On Feb 8, 2019, at 8:19 AM, Steve French <smfrench@xxxxxxxxx> wrote: <snip> > > I did some experiments changing the block size returned from 1K to 64K to 1MB > > and see no difference in the copy size used by cp (it was always 128K in all > > the cases when caching is disabled) I figured out the problem - I read your note as meaning s_blocksize (which not st_blksize), ie the block size in the superblock not on the file. Changing st_blksize (stat->blksize) to 4MB did lead to the better performance (and large I/O matching the block size) for uncached cp > Strange. I just re-tested this on Lustre, in case something had changed in > GNU fileutils that I didn't notice, and it worked fine for me, using both > "cp --version = 8.4" on RHEL and "cp --version = 8.26" on Ubuntu: > > $ dd if=/dev/urandom of=/tmp/foo bs=1M count=12 > $ strace -v cp /tmp/foo /testfs/tmp > : > open("/tmp/foo", O_RDONLY) = 3 > fstat(3, {... st_blksize=4096, st_blocks=24576, st_size=12582912, ...}) = 0 > open("/testfs/tmp/foo", O_WRONLY|O_CREAT|O_EXCL, 0664) = 4 > fstat(4, { ... st_blksize=4194304, st_blocks=0, st_size=0, ...}) = 0 > read(3, "h\230#`\2\223\273\3423W\24\222:\2113w\327"..., 4194304) = 4194304 > write(4, "h\230#`\2\223\273\3423W\24\222:\2113w\327"..., 4194304) = 4194304 > : > > Note the "st_blksize=4194304" for the target file returned by Lustre matches > the read and write buffer size used by "cp". The same is true if Lustre is > the source file and not the target, so it probably picks the maximum of both: > > open("/testfs/tmp/foo", O_RDONLY) = 3 > fstat(3, {... st_blksize=4194304, st_blocks=24576, st_size=12582912 ...}) = 0 > open("/tmp/bar", O_WRONLY|O_TRUNC) = 4 > fstat(4, {... st_blksize=4096, st_blocks=0, st_size=0 ...}) = 0 > read(3, "h\230#`\2\223\273\3423W\24\222:\2113w\327"..., 4194304) = 4194304 > write(4, "h\230#`\2\223\273\3423W\24\222:\2113w\327"..., 4194304) = 4194304 > : > > Running the same command with /tmp as the target uses a smaller buffer size > matching the "st_blocks=32768" and correspondingly more read/write calls: > > $ strace -v cp /tmp/foo /tmp/baz > : > open("/tmp/baz", O_WRONLY|O_CREAT|O_EXCL, 0664) = 4 > fstat(4, {... st_blksize=4096, st_blocks=0, st_size=0, ...}) = 0 > read(3, "h\230#`\2\223\273\3423W\24\222:\2113w\327"..., 32768) = 32768 > write(4, "h\230#`\2\223\273\3423W\24\222:\2113w\327"..., 32768) = 32768 > : > > In this case, cp probably has some minimum buffer size it uses to avoid the > poor performance of using 4KB blocks. Yes - although the code is a little hard to follow it looks like 128K in my system's version of cp (Ubuntu) -- Thanks, Steve