Re: scp bug due to progress indicator when copying from remote to local on Linux

Steve French <smfrench@xxxxxxxxx> · Fri, 11 Jan 2019 16:02:14 -0600

On Fri, Jan 11, 2019 at 3:48 PM Andreas Dilger <adilger@xxxxxxxxx> wrote:
>
> On Jan 11, 2019, at 2:13 PM, Steve French <smfrench@xxxxxxxxx> wrote:
> >
> > On Fri, Jan 11, 2019 at 7:28 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> >>
> >> On Fri, Jan 11, 2019 at 12:11:00AM -0600, Steve French wrote:
> >>> In discussing an interesting problem with scp recently with Pavel, we
> >>> found a fairly obvious bug in scp when it is run with a progress
> >>> indicator (which is the default when the source file is remote).
> >>>
> >>> scp triggers "SIGALRM" probably from update_progress_meter() in
> >>> progressmeter.c when executed with "scp localhost:somelargefile /mnt"
> >>> ie with an ssh source path but a local path for the target, as long as
<snip>
> I'm guessing this is writing to a CIFS filesystem as the target, or some
> other filesystem that doesn't handle sparse files?  AFAIK, at one point
> there was a heuristic like "truncate-to-non-zero triggers file
> preallocation" before fallocate() was available?

cifs can handle sparse files (it has had protocol support for various
fallocate equivalents for many, many years) , and I think this would
fail to other file systems depending on how long flushing cached
data takes (presumably a function of disk or network speed,
size of cached data etc.)

> Otherwise, why would the truncate() take a long time even if the file
> is large, since it shouldn't actually be doing any work to change the
> size of the file (I'm assuming that the target file is either empty or
> about the same size during an overwrite, you didn't mention anything
> unusual as a requirement).

Note that scp (unlike rsync, robocopy and some other tools doesn't
appear to offer an option to do a 'set the file size first'
then do the writes ... (which helps with some types of file systems),
but setting the file size is not the issue in this case (perhaps it would
be in some cloud file systems, or cluster file systems but not in this
case).

> Other than the duration of the truncate() call being excessive, it seems
> to me that everything is working as expected, or at least what the code
> asked for?  It registered SIGALRM, so when a syscall returns -ERESTARTSYS
> it should actually restart that call.
>
>
> My typical response for issues like this is to find the github repo for
> the code and push a patch to fix the code myself.  If I can't find the
> upstream project/source repo easily, download the .src.rpm and make a
> patch and submit it to the major distro package maintainers (RHEL, SLES,
> Ubuntu or Debian), so that at least it is fixed for the vast majority
> of users, and those maintainers can handle pushing the patch further
> upstream (and smaller distros will copy patches/sources from them).

I have built the scp code as an experiment and by the way there are many,
many things that could be changed (other copy tools e.g. robocopy even rsync
and cp) look much more mature (with lots more useful performance options).
My earlier experiments e.g. changing the scp i/o size to something larger
did help as expected for various cases (16K is suboptimal in many cases).

But in this case it seems much simpler - the code path is basically
read 16K over ssh
write 16K to local/network file system (cached)
(repeated 1000s of times)
then ftruncate

ftruncate if it were simply setting the file size would be very fast, except
that setting the file size needs to have the data written out (writes
otherwise could risk extending the file size) so triggers a call to writepages
(filemap_write_and_wait) as a sideeffect
so all the cached data is finally written out which can take a few
seconds and cause the scp progress indicator to malfunction
and trigger the SIGALRM.  This is normally not a problem
fsync etc. can take longer than a single i/o operation.

-- 
Thanks,

Steve