Re: scp bug due to progress indicator when copying from remote to local on Linux

Pavel Shilovsky <piastryyy@xxxxxxxxx> · Mon, 14 Jan 2019 11:48:41 -0800

пт, 11 янв. 2019 г. в 13:22, Matthew Wilcox <willy@xxxxxxxxxxxxx>:
>
> On Fri, Jan 11, 2019 at 03:13:05PM -0600, Steve French wrote:
> > On Fri, Jan 11, 2019 at 7:28 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> > > Are you saying the SIGALRM interrupts ftruncate() and causes the ftruncate
> > > to fail?
> >
> > So ftruncate does not really fail (the file contents and size match on
> > source and target after the copy) but the scp 'fails' and the user
> > would be quite confused (and presumably the network stack doesn't like
> > this signal, which can cause disconnects etc. which in theory could
> > cause reconnect/data loss issues in some corner cases).
>
> You've run into the problem that userspace simply doesn't check the
> return value from syscalls.  It's not just scp, it's every program.
> Looking through cifs, you seem to do a lot of wait_event_interruptible()
> where you maybe should be doing wait_event_killable()?

We are doing wait_event_interruptible() mostly in places where we are
waiting on a blocking byte-range lock or when we are waiting for a TCP
connection to be established (e.g. after a reconnect)

>
> > ftruncate(3, 262144000)                 = ? ERESTARTSYS (To be
> > restarted if SA_RESTART is set)
> > --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
> > --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
> > rt_sigreturn({mask=[ALRM]})             = 0
> > ioctl(1, TIOCGWINSZ, {ws_row=51, ws_col=156, ws_xpixel=0, ws_ypixel=0}) = 0
> > getpgrp()                               = 82563
>
> Right ... so the code never calls ftruncate() again.  Changing all of
> userspace is just not going to happen; maybe you could get stuff fixed in
> libc, but really ftruncate() should only be interrupted by a fatal signal
> and not by SIGALRM.

It seems that SA_RESTART is just not set for SCP. What do you think
about returning ERESTARTNOINTR instead for this specific case -
filemap_write_and_wait during ftruncate? It should force the syscall
to be restarted regardless of the userspace program settings.

--
Best regards,
Pavel Shilovsky