Re: [PATCHv4] sctp: Don't add the shutdown timer if its already been added

Neil Horman <nhorman@xxxxxxxxxxxxx> · Tue, 28 Apr 2020 06:55:53 -0400



On Mon, Apr 27, 2020 at 02:34:17PM +0300, Jere Leppänen wrote:
> On Mon, 21 Mar 2016, Neil Horman wrote:
> 
> > This BUG halt was recently reported:
> > 
> > PID: 2879   TASK: c16adaa0  CPU: 1   COMMAND: "sctpn"
> >  #0 [f418dd28] crash_kexec at c04a7d8c
> >  #1 [f418dd7c] oops_end at c0863e02
> >  #2 [f418dd90] do_invalid_op at c040aaca
> >  #3 [f418de28] error_code (via invalid_op) at c08631a5
> >     EAX: f34baac0  EBX: 00000090  ECX: f418deb0  EDX: f5542950  EBP: 00000000
> >     DS:  007b      ESI: f34ba800  ES:  007b      EDI: f418dea0  GS:  00e0
> >     CS:  0060      EIP: c046fa5e  ERR: ffffffff  EFLAGS: 00010286
> >  #4 [f418de5c] add_timer at c046fa5e
> >  #5 [f418de68] sctp_do_sm at f8db8c77 [sctp]
> >  #6 [f418df30] sctp_primitive_SHUTDOWN at f8dcc1b5 [sctp]
> >  #7 [f418df48] inet_shutdown at c080baf9
> >  #8 [f418df5c] sys_shutdown at c079eedf
> >  #9 [f418df70] sys_socketcall at c079fe88
> >     EAX: ffffffda  EBX: 0000000d  ECX: bfceea90  EDX: 0937af98
> >     DS:  007b      ESI: 0000000c  ES:  007b      EDI: b7150ae4
> >     SS:  007b      ESP: bfceea7c  EBP: bfceeaa8  GS:  0033
> >     CS:  0073      EIP: b775c424  ERR: 00000066  EFLAGS: 00000282
> > 
> > It appears that the side effect that starts the shutdown timer was processed
> > multiple times, which can happen as multiple paths can trigger it.  This of
> > course leads to the BUG halt in add_timer getting called.
> > 
> > Fix seems pretty straightforward, just check before the timer is added if its
> > already been started.  If it has mod the timer instead to min(current
> > expiration, new expiration)
> > 
> > Its been tested but not confirmed to fix the problem, as the issue has only
> > occured in production environments where test kernels are enjoined from being
> > installed.  It appears to be a sane fix to me though.
> 
> Hi Neil,
> 
> This patch was acked by Marcelo four years ago, but was then apparently 
> forgotten about and never applied. I have a comment to the patch itself, 
> below, so would it possible for you to reroll another version?
> 
> Here's one way - not necessarily the only way - to hit this kernel BUG:
> 
Yeah, I'll take a look at it today

Best
Neil

> --- clip ---
> 
> # cat sctp_t2_double_start.c
> #include <stdlib.h>
> #include <unistd.h>
> #include <sys/socket.h>
> #include <sys/types.h>
> #include <netinet/in.h>
> #include <netinet/sctp.h>
> #include <arpa/inet.h>
> 
> int main()
> {
> 	int listen_fd, server_fd, client_fd;
> 
> 	struct sockaddr_in addr = {
> 		.sin_family = AF_INET,
> 		.sin_addr.s_addr = inet_addr("127.0.0.1"),
> 		.sin_port = htons(65000),
> 	};
> 
> 	struct sockaddr_in caddr = addr;
> 	caddr.sin_port = htons(65001);
> 
> 	listen_fd = socket(AF_INET, SOCK_STREAM, IPPROTO_SCTP);
> 	bind(listen_fd, (struct sockaddr *)&addr, sizeof(addr));
> 	listen(listen_fd, 10);
> 	client_fd = socket(AF_INET, SOCK_STREAM, IPPROTO_SCTP);
> 	bind(client_fd, (struct sockaddr *)&caddr, sizeof(caddr));
> 	connect(client_fd, (struct sockaddr *)&addr, sizeof(addr));
> 	struct linger lin = { .l_onoff = 1, .l_linger = 0, };
> 	setsockopt(client_fd, SOL_SOCKET, SO_LINGER, &lin, sizeof(lin));
> 	server_fd = accept(listen_fd, NULL, NULL);
> 	system("iptables -A INPUT -p sctp -j DROP");
> 	shutdown(server_fd, SHUT_WR);
> 	close(client_fd);
> 	system("iptables -D INPUT -p sctp -j DROP");
> 	client_fd = socket(AF_INET, SOCK_STREAM, IPPROTO_SCTP);
> 	bind(client_fd, (struct sockaddr *)&caddr, sizeof(caddr));
> 	connect(client_fd, (struct sockaddr *)&addr, sizeof(addr));
> 	close(server_fd); // segfault and add_timer() kernel BUG
> 
> 	return 0;
> }
> # gcc sctp_t2_double_start.c
> # ./a.out 
> [  105.117695] ------------[ cut here ]------------
> [  105.120047] kernel BUG at kernel/time/timer.c:1135!
> [  105.122536] invalid opcode: 0000 [#1] SMP PTI
> [  105.124646] CPU: 8 PID: 1126 Comm: a.out Not tainted 5.5.15-200.fc31.x86_64 #1
> [  105.128029] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
> [  105.131380] RIP: 0010:add_timer+0x14a/0x200
> [  105.132382] Code: 5b 5d 41 5c 41 5d 41 5e c3 25 00 00 08 00 41 89 c6 e9 47 ff ff ff 48 c7 c5 00 d1 01 00 65 48 03 2d 4b 2b ea 5b e9 4b ff ff ff <0f> 0b 0f 0b 49 89 ec eb 8a e8 18 3f fa ff 48 c7 c2 80 be 01 00 89
> [  105.136724] RSP: 0018:ffff9c5440defba8 EFLAGS: 00010286
> [  105.137972] RAX: 0000000000000000 RBX: ffff8fc5ab0c6000 RCX: 00000000fffd0462
> [  105.139658] RDX: 00000000fffd101a RSI: 0000000000000012 RDI: ffff8fc5ab0c6370
> [  105.141338] RBP: ffff8fc5ab0c6000 R08: ffff8fc5ab0c6018 R09: ffff8fc5a90a6204
> [  105.143026] R10: ffff8fc5c8c06d80 R11: ffff8fc5c16e7410 R12: 0000000000000000
> [  105.144725] R13: 0000000000000001 R14: ffff8fc5ab0c6370 R15: ffff9c5440defd50
> [  105.146391] FS:  00007f7b87b02540(0000) GS:ffff8fc5cf400000(0000) knlGS:0000000000000000
> [  105.148330] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  105.149686] CR2: 00007f7b87937ff8 CR3: 0000000bf1fb8005 CR4: 0000000000160ee0
> [  105.151362] Call Trace:
> [  105.152007]  sctp_do_sm+0xc10/0x1e20 [sctp]
> [  105.153024]  ? pick_next_task_fair+0x19a/0x380
> [  105.154086]  ? __schedule+0x669/0x740
> [  105.154971]  ? __switch_to_asm+0x40/0x70
> [  105.155914]  ? __switch_to_asm+0x34/0x70
> [  105.156850]  ? __switch_to_asm+0x40/0x70
> [  105.157787]  ? __switch_to_asm+0x34/0x70
> [  105.158729]  ? __switch_to_asm+0x40/0x70
> [  105.159667]  ? __switch_to_asm+0x34/0x70
> [  105.160608]  ? __switch_to_asm+0x40/0x70
> [  105.161545]  ? __switch_to_asm+0x34/0x70
> [  105.162480]  ? __switch_to_asm+0x40/0x70
> [  105.163412]  ? __switch_to_asm+0x34/0x70
> [  105.164379]  ? __switch_to_asm+0x40/0x70
> [  105.165326]  sctp_primitive_SHUTDOWN+0x2f/0x40 [sctp]
> [  105.166537]  sctp_close+0x200/0x320 [sctp]
> [  105.167528]  inet_release+0x34/0x60
> [  105.168373]  __sock_release+0x3d/0xa0
> [  105.169265]  sock_close+0x11/0x20
> [  105.170072]  __fput+0xc1/0x250
> [  105.170822]  task_work_run+0x8a/0xb0
> [  105.171694]  exit_to_usermode_loop+0x102/0x130
> [  105.172760]  do_syscall_64+0x1a6/0x1c0
> [  105.173663]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  105.175557] RIP: 0033:0x7f7b87a2ac07
> [  105.177069] Code: ff ff e8 fc e8 01 00 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 f3 41 f9 ff
> [  105.182798] RSP: 002b:00007ffe10833458 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
> [  105.185271] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007f7b87a2ac07
> [  105.187653] RDX: 0000000000000000 RSI: 00007ffe10833420 RDI: 0000000000000005
> [  105.190029] RBP: 00007ffe108334a0 R08: 0000000000000000 R09: 0000000000000000
> [  105.192398] R10: 0000000000000008 R11: 0000000000000246 R12: 00000000004010f0
> [  105.194779] R13: 00007ffe10833580 R14: 0000000000000000 R15: 0000000000000000
> [  105.197128] Modules linked in: iptable_filter sctp libcrc32c rfkill intel_rapl_msr intel_rapl_common sb_edac kvm_intel snd_pcsp bochs_drm kvm drm_vram_helper snd_pcm irqbypass drm_ttm_helper ttm snd_timer ppdev drm_kms_helper snd virg
> [  105.210029] ---[ end trace 8399eda7e3a18429 ]---
> [  105.211899] RIP: 0010:add_timer+0x14a/0x200
> [  105.213611] Code: 5b 5d 41 5c 41 5d 41 5e c3 25 00 00 08 00 41 89 c6 e9 47 ff ff ff 48 c7 c5 00 d1 01 00 65 48 03 2d 4b 2b ea 5b e9 4b ff ff ff <0f> 0b 0f 0b 49 89 ec eb 8a e8 18 3f fa ff 48 c7 c2 80 be 01 00 89
> [  105.220063] RSP: 0018:ffff9c5440defba8 EFLAGS: 00010286
> [  105.222102] RAX: 0000000000000000 RBX: ffff8fc5ab0c6000 RCX: 00000000fffd0462
> [  105.224566] RDX: 00000000fffd101a RSI: 0000000000000012 RDI: ffff8fc5ab0c6370
> [  105.227022] RBP: ffff8fc5ab0c6000 R08: ffff8fc5ab0c6018 R09: ffff8fc5a90a6204
> [  105.229452] R10: ffff8fc5c8c06d80 R11: ffff8fc5c16e7410 R12: 0000000000000000
> [  105.231882] R13: 0000000000000001 R14: ffff8fc5ab0c6370 R15: ffff9c5440defd50
> [  105.234299] FS:  00007f7b87b02540(0000) GS:ffff8fc5cf400000(0000) knlGS:0000000000000000
> [  105.236935] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  105.239040] CR2: 00007f7b87937ff8 CR3: 0000000bf1fb8005 CR4: 0000000000160ee0
> Segmentation fault
> 
> --- clip ---
> 
> The reason why this reproduces the BUG is because on the server side, the 
> reconnect triggers an association restart, which moves the association to 
> ESTABLISHED state, but leaves the T2 timer running. After that, close() 
> triggers a new shutdown and an attempt to start the T2 timer again.
> 
> With your patch, the BUG is not reproduced.
> 
> Moving the association to ESTABLISHED state even though the socket is 
> already closed is in itself a problem. I have a simple patch for that 
> ready to be posted, but your patch should be handled first.
> 
> > 
> > Signed-off-by: Neil Horman <nhorman@xxxxxxxxxxxxx>
> > CC: Vlad Yasevich <vyasevich@xxxxxxxxx>
> > CC: "David S. Miller" <davem@xxxxxxxxxxxxx>
> > 
> > Change notes:
> > 
> > v2) Removed WARN_ON, as its not useful
> > 
> > v3) Actually commit my changes
> > 
> > v4) Reduce code size by rolling expiration update into mod_timer line
> > ---
> >  net/sctp/sm_sideeffect.c | 18 +++++++++++++++---
> >  1 file changed, 15 insertions(+), 3 deletions(-)
> > 
> > diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
> > index b5327bb..192ec99 100644
> > --- a/net/sctp/sm_sideeffect.c
> > +++ b/net/sctp/sm_sideeffect.c
> > @@ -1480,9 +1480,21 @@ static int sctp_cmd_interpreter(sctp_event_t event_type,
> >  			timeout = asoc->timeouts[cmd->obj.to];
> >  			BUG_ON(!timeout);
> >  
> > -			timer->expires = jiffies + timeout;
> > -			sctp_association_hold(asoc);
> > -			add_timer(timer);
> > +			/*
> > +			 * SCTP has a hard time with timer starts.  Because we process
> > +			 * timer starts as side effects, it can be hard to tell if we
> > +			 * have already started a timer or not, which leads to BUG
> > +			 * halts when we call add_timer. So here, instead of just starting
> > +			 * a timer, just determine the shorter of the two
> > +			 * timeouts and mod the timer to that.
> > +			 */
> > +			if (timer_pending(timer)) {
> > +				if (time_after(timer->expires, (jiffies + timeout)))
> > +					mod_timer(timer, jiffies + timeout);
> > +			} else {
> > +				sctp_association_hold(asoc);
> > +				mod_timer(timer, jiffies + timeout);
> > +			}
> 
> Meanwhile, timer_reduce() has been added. Just call that instead?
> 
> >  			break;
> >  
> >  		case SCTP_CMD_TIMER_RESTART:
> > -- 
> > 2.5.5
> > 
>