On Fri, Jan 11, 2019 at 5:05 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > On Fri, Jan 11, 2019 at 03:50:02PM -0600, Steve French wrote: > > On Fri, Jan 11, 2019 at 3:22 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > > Right ... so the code never calls ftruncate() again. Changing all of > > > userspace is just not going to happen; maybe you could get stuff fixed in > > > libc, but really ftruncate() should only be interrupted by a fatal signal > > > and not by SIGALRM. > > > > Looking at the places wait_event_interruptible is done I didn't see code > > in fs/cifs that would match the (presumably) code path, mostly those > > calls are in > > smbdirect (RDMA) code) - for example cifs_setattr does call > > filemap_write_and_wait > > but as it goes down into the mm layer and then to cifs_writepages and > > the SMB3 write > > code, I didn't spot a "wait_event_interruptible" in that path (I might > > have missed > > something in the mm layer). I do see one in the cifs reconnect path, > > but that is > > not what we are typically hitting. Any ideas how to match what we > > are blocked in when > > we get the annoying SIGALRM? Another vague thought - is it possible > > to block SIGALRM > > across all of cifs_setattr? If it is - why do so few (only 3!) file > > systems (ceph, jffs2, ocfs2 > > ever call sigprocmask)? > > You can see where a task is currently sleeping with 'cat /proc/$pid/stack'. > If you can provoke a long duration ftruncate, that'd be a good place to > start looking. Not surprisingly it is waiting in mm code: root@smf-copy-test3:~# cat /proc/92189/stack [<0>] io_schedule+0x16/0x40 [<0>] wait_on_page_bit_common+0x14f/0x350 [<0>] __filemap_fdatawait_range+0x104/0x160 [<0>] filemap_write_and_wait+0x4d/0x90 [<0>] cifs_setattr+0xc9/0xe80 [cifs] [<0>] notify_change+0x2d2/0x460 [<0>] do_truncate+0x78/0xc0 [<0>] do_sys_ftruncate+0x14c/0x1c0 [<0>] __x64_sys_ftruncate+0x1b/0x20 [<0>] do_syscall_64+0x5a/0x110 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [<0>] 0xffffffffffffffff So that brings me back to thinking about whether it is practical to mask signals (non killable signals) in a few places in cifs.ko as apparently at least a few file systems (ceph and jffs2 and ocfs2) do. In particular mask SIGALRM across calls to filemap_write_and_wait -- Thanks, Steve