пт, 11 янв. 2019 г. в 13:22, Matthew Wilcox <willy@xxxxxxxxxxxxx>: > > On Fri, Jan 11, 2019 at 03:13:05PM -0600, Steve French wrote: > > On Fri, Jan 11, 2019 at 7:28 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > > Are you saying the SIGALRM interrupts ftruncate() and causes the ftruncate > > > to fail? > > > > So ftruncate does not really fail (the file contents and size match on > > source and target after the copy) but the scp 'fails' and the user > > would be quite confused (and presumably the network stack doesn't like > > this signal, which can cause disconnects etc. which in theory could > > cause reconnect/data loss issues in some corner cases). > > You've run into the problem that userspace simply doesn't check the > return value from syscalls. It's not just scp, it's every program. > Looking through cifs, you seem to do a lot of wait_event_interruptible() > where you maybe should be doing wait_event_killable()? We are doing wait_event_interruptible() mostly in places where we are waiting on a blocking byte-range lock or when we are waiting for a TCP connection to be established (e.g. after a reconnect) > > > ftruncate(3, 262144000) = ? ERESTARTSYS (To be > > restarted if SA_RESTART is set) > > --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} --- > > --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} --- > > rt_sigreturn({mask=[ALRM]}) = 0 > > ioctl(1, TIOCGWINSZ, {ws_row=51, ws_col=156, ws_xpixel=0, ws_ypixel=0}) = 0 > > getpgrp() = 82563 > > Right ... so the code never calls ftruncate() again. Changing all of > userspace is just not going to happen; maybe you could get stuff fixed in > libc, but really ftruncate() should only be interrupted by a fatal signal > and not by SIGALRM. It seems that SA_RESTART is just not set for SCP. What do you think about returning ERESTARTNOINTR instead for this specific case - filemap_write_and_wait during ftruncate? It should force the syscall to be restarted regardless of the userspace program settings. -- Best regards, Pavel Shilovsky