Re: scp bug due to progress indicator when copying from remote to local on Linux

Matthew Wilcox <willy@xxxxxxxxxxxxx> · Fri, 11 Jan 2019 13:22:40 -0800

On Fri, Jan 11, 2019 at 03:13:05PM -0600, Steve French wrote:
> On Fri, Jan 11, 2019 at 7:28 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> > Are you saying the SIGALRM interrupts ftruncate() and causes the ftruncate
> > to fail?
> 
> So ftruncate does not really fail (the file contents and size match on
> source and target after the copy) but the scp 'fails' and the user
> would be quite confused (and presumably the network stack doesn't like
> this signal, which can cause disconnects etc. which in theory could
> cause reconnect/data loss issues in some corner cases).

You've run into the problem that userspace simply doesn't check the
return value from syscalls.  It's not just scp, it's every program.
Looking through cifs, you seem to do a lot of wait_event_interruptible()
where you maybe should be doing wait_event_killable()?

> ftruncate(3, 262144000)                 = ? ERESTARTSYS (To be
> restarted if SA_RESTART is set)
> --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
> --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
> rt_sigreturn({mask=[ALRM]})             = 0
> ioctl(1, TIOCGWINSZ, {ws_row=51, ws_col=156, ws_xpixel=0, ws_ypixel=0}) = 0
> getpgrp()                               = 82563

Right ... so the code never calls ftruncate() again.  Changing all of
userspace is just not going to happen; maybe you could get stuff fixed in
libc, but really ftruncate() should only be interrupted by a fatal signal
and not by SIGALRM.