On Fri, Jan 11, 2019 at 3:22 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > On Fri, Jan 11, 2019 at 03:13:05PM -0600, Steve French wrote: > > On Fri, Jan 11, 2019 at 7:28 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > > Are you saying the SIGALRM interrupts ftruncate() and causes the ftruncate > > > to fail? > > > > So ftruncate does not really fail (the file contents and size match on > > source and target after the copy) but the scp 'fails' and the user > > would be quite confused (and presumably the network stack doesn't like > > this signal, which can cause disconnects etc. which in theory could > > cause reconnect/data loss issues in some corner cases). > > You've run into the problem that userspace simply doesn't check the > return value from syscalls. It's not just scp, it's every program. > Looking through cifs, you seem to do a lot of wait_event_interruptible() > where you maybe should be doing wait_event_killable()? > > > ftruncate(3, 262144000) = ? ERESTARTSYS (To be > > restarted if SA_RESTART is set) > > --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} --- > > --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} --- > > rt_sigreturn({mask=[ALRM]}) = 0 > > ioctl(1, TIOCGWINSZ, {ws_row=51, ws_col=156, ws_xpixel=0, ws_ypixel=0}) = 0 > > getpgrp() = 82563 > > Right ... so the code never calls ftruncate() again. Changing all of > userspace is just not going to happen; maybe you could get stuff fixed in > libc, but really ftruncate() should only be interrupted by a fatal signal > and not by SIGALRM. Looking at the places wait_event_interruptible is done I didn't see code in fs/cifs that would match the (presumably) code path, mostly those calls are in smbdirect (RDMA) code) - for example cifs_setattr does call filemap_write_and_wait but as it goes down into the mm layer and then to cifs_writepages and the SMB3 write code, I didn't spot a "wait_event_interruptible" in that path (I might have missed something in the mm layer). I do see one in the cifs reconnect path, but that is not what we are typically hitting. Any ideas how to match what we are blocked in when we get the annoying SIGALRM? Another vague thought - is it possible to block SIGALRM across all of cifs_setattr? If it is - why do so few (only 3!) file systems (ceph, jffs2, ocfs2 ever call sigprocmask)? -- Thanks, Steve