Re: [PATCHv4] sctp: Don't add the shutdown timer if its already been added

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 21 Mar 2016, Neil Horman wrote:

> This BUG halt was recently reported:
> 
> PID: 2879   TASK: c16adaa0  CPU: 1   COMMAND: "sctpn"
>  #0 [f418dd28] crash_kexec at c04a7d8c
>  #1 [f418dd7c] oops_end at c0863e02
>  #2 [f418dd90] do_invalid_op at c040aaca
>  #3 [f418de28] error_code (via invalid_op) at c08631a5
>     EAX: f34baac0  EBX: 00000090  ECX: f418deb0  EDX: f5542950  EBP: 00000000
>     DS:  007b      ESI: f34ba800  ES:  007b      EDI: f418dea0  GS:  00e0
>     CS:  0060      EIP: c046fa5e  ERR: ffffffff  EFLAGS: 00010286
>  #4 [f418de5c] add_timer at c046fa5e
>  #5 [f418de68] sctp_do_sm at f8db8c77 [sctp]
>  #6 [f418df30] sctp_primitive_SHUTDOWN at f8dcc1b5 [sctp]
>  #7 [f418df48] inet_shutdown at c080baf9
>  #8 [f418df5c] sys_shutdown at c079eedf
>  #9 [f418df70] sys_socketcall at c079fe88
>     EAX: ffffffda  EBX: 0000000d  ECX: bfceea90  EDX: 0937af98
>     DS:  007b      ESI: 0000000c  ES:  007b      EDI: b7150ae4
>     SS:  007b      ESP: bfceea7c  EBP: bfceeaa8  GS:  0033
>     CS:  0073      EIP: b775c424  ERR: 00000066  EFLAGS: 00000282
> 
> It appears that the side effect that starts the shutdown timer was processed
> multiple times, which can happen as multiple paths can trigger it.  This of
> course leads to the BUG halt in add_timer getting called.
> 
> Fix seems pretty straightforward, just check before the timer is added if its
> already been started.  If it has mod the timer instead to min(current
> expiration, new expiration)
> 
> Its been tested but not confirmed to fix the problem, as the issue has only
> occured in production environments where test kernels are enjoined from being
> installed.  It appears to be a sane fix to me though.

Hi Neil,

This patch was acked by Marcelo four years ago, but was then apparently 
forgotten about and never applied. I have a comment to the patch itself, 
below, so would it possible for you to reroll another version?

Here's one way - not necessarily the only way - to hit this kernel BUG:

--- clip ---

# cat sctp_t2_double_start.c
#include <stdlib.h>
#include <unistd.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <netinet/in.h>
#include <netinet/sctp.h>
#include <arpa/inet.h>

int main()
{
	int listen_fd, server_fd, client_fd;

	struct sockaddr_in addr = {
		.sin_family = AF_INET,
		.sin_addr.s_addr = inet_addr("127.0.0.1"),
		.sin_port = htons(65000),
	};

	struct sockaddr_in caddr = addr;
	caddr.sin_port = htons(65001);

	listen_fd = socket(AF_INET, SOCK_STREAM, IPPROTO_SCTP);
	bind(listen_fd, (struct sockaddr *)&addr, sizeof(addr));
	listen(listen_fd, 10);
	client_fd = socket(AF_INET, SOCK_STREAM, IPPROTO_SCTP);
	bind(client_fd, (struct sockaddr *)&caddr, sizeof(caddr));
	connect(client_fd, (struct sockaddr *)&addr, sizeof(addr));
	struct linger lin = { .l_onoff = 1, .l_linger = 0, };
	setsockopt(client_fd, SOL_SOCKET, SO_LINGER, &lin, sizeof(lin));
	server_fd = accept(listen_fd, NULL, NULL);
	system("iptables -A INPUT -p sctp -j DROP");
	shutdown(server_fd, SHUT_WR);
	close(client_fd);
	system("iptables -D INPUT -p sctp -j DROP");
	client_fd = socket(AF_INET, SOCK_STREAM, IPPROTO_SCTP);
	bind(client_fd, (struct sockaddr *)&caddr, sizeof(caddr));
	connect(client_fd, (struct sockaddr *)&addr, sizeof(addr));
	close(server_fd); // segfault and add_timer() kernel BUG

	return 0;
}
# gcc sctp_t2_double_start.c
# ./a.out 
[  105.117695] ------------[ cut here ]------------
[  105.120047] kernel BUG at kernel/time/timer.c:1135!
[  105.122536] invalid opcode: 0000 [#1] SMP PTI
[  105.124646] CPU: 8 PID: 1126 Comm: a.out Not tainted 5.5.15-200.fc31.x86_64 #1
[  105.128029] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
[  105.131380] RIP: 0010:add_timer+0x14a/0x200
[  105.132382] Code: 5b 5d 41 5c 41 5d 41 5e c3 25 00 00 08 00 41 89 c6 e9 47 ff ff ff 48 c7 c5 00 d1 01 00 65 48 03 2d 4b 2b ea 5b e9 4b ff ff ff <0f> 0b 0f 0b 49 89 ec eb 8a e8 18 3f fa ff 48 c7 c2 80 be 01 00 89
[  105.136724] RSP: 0018:ffff9c5440defba8 EFLAGS: 00010286
[  105.137972] RAX: 0000000000000000 RBX: ffff8fc5ab0c6000 RCX: 00000000fffd0462
[  105.139658] RDX: 00000000fffd101a RSI: 0000000000000012 RDI: ffff8fc5ab0c6370
[  105.141338] RBP: ffff8fc5ab0c6000 R08: ffff8fc5ab0c6018 R09: ffff8fc5a90a6204
[  105.143026] R10: ffff8fc5c8c06d80 R11: ffff8fc5c16e7410 R12: 0000000000000000
[  105.144725] R13: 0000000000000001 R14: ffff8fc5ab0c6370 R15: ffff9c5440defd50
[  105.146391] FS:  00007f7b87b02540(0000) GS:ffff8fc5cf400000(0000) knlGS:0000000000000000
[  105.148330] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  105.149686] CR2: 00007f7b87937ff8 CR3: 0000000bf1fb8005 CR4: 0000000000160ee0
[  105.151362] Call Trace:
[  105.152007]  sctp_do_sm+0xc10/0x1e20 [sctp]
[  105.153024]  ? pick_next_task_fair+0x19a/0x380
[  105.154086]  ? __schedule+0x669/0x740
[  105.154971]  ? __switch_to_asm+0x40/0x70
[  105.155914]  ? __switch_to_asm+0x34/0x70
[  105.156850]  ? __switch_to_asm+0x40/0x70
[  105.157787]  ? __switch_to_asm+0x34/0x70
[  105.158729]  ? __switch_to_asm+0x40/0x70
[  105.159667]  ? __switch_to_asm+0x34/0x70
[  105.160608]  ? __switch_to_asm+0x40/0x70
[  105.161545]  ? __switch_to_asm+0x34/0x70
[  105.162480]  ? __switch_to_asm+0x40/0x70
[  105.163412]  ? __switch_to_asm+0x34/0x70
[  105.164379]  ? __switch_to_asm+0x40/0x70
[  105.165326]  sctp_primitive_SHUTDOWN+0x2f/0x40 [sctp]
[  105.166537]  sctp_close+0x200/0x320 [sctp]
[  105.167528]  inet_release+0x34/0x60
[  105.168373]  __sock_release+0x3d/0xa0
[  105.169265]  sock_close+0x11/0x20
[  105.170072]  __fput+0xc1/0x250
[  105.170822]  task_work_run+0x8a/0xb0
[  105.171694]  exit_to_usermode_loop+0x102/0x130
[  105.172760]  do_syscall_64+0x1a6/0x1c0
[  105.173663]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  105.175557] RIP: 0033:0x7f7b87a2ac07
[  105.177069] Code: ff ff e8 fc e8 01 00 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 f3 41 f9 ff
[  105.182798] RSP: 002b:00007ffe10833458 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
[  105.185271] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007f7b87a2ac07
[  105.187653] RDX: 0000000000000000 RSI: 00007ffe10833420 RDI: 0000000000000005
[  105.190029] RBP: 00007ffe108334a0 R08: 0000000000000000 R09: 0000000000000000
[  105.192398] R10: 0000000000000008 R11: 0000000000000246 R12: 00000000004010f0
[  105.194779] R13: 00007ffe10833580 R14: 0000000000000000 R15: 0000000000000000
[  105.197128] Modules linked in: iptable_filter sctp libcrc32c rfkill intel_rapl_msr intel_rapl_common sb_edac kvm_intel snd_pcsp bochs_drm kvm drm_vram_helper snd_pcm irqbypass drm_ttm_helper ttm snd_timer ppdev drm_kms_helper snd virg
[  105.210029] ---[ end trace 8399eda7e3a18429 ]---
[  105.211899] RIP: 0010:add_timer+0x14a/0x200
[  105.213611] Code: 5b 5d 41 5c 41 5d 41 5e c3 25 00 00 08 00 41 89 c6 e9 47 ff ff ff 48 c7 c5 00 d1 01 00 65 48 03 2d 4b 2b ea 5b e9 4b ff ff ff <0f> 0b 0f 0b 49 89 ec eb 8a e8 18 3f fa ff 48 c7 c2 80 be 01 00 89
[  105.220063] RSP: 0018:ffff9c5440defba8 EFLAGS: 00010286
[  105.222102] RAX: 0000000000000000 RBX: ffff8fc5ab0c6000 RCX: 00000000fffd0462
[  105.224566] RDX: 00000000fffd101a RSI: 0000000000000012 RDI: ffff8fc5ab0c6370
[  105.227022] RBP: ffff8fc5ab0c6000 R08: ffff8fc5ab0c6018 R09: ffff8fc5a90a6204
[  105.229452] R10: ffff8fc5c8c06d80 R11: ffff8fc5c16e7410 R12: 0000000000000000
[  105.231882] R13: 0000000000000001 R14: ffff8fc5ab0c6370 R15: ffff9c5440defd50
[  105.234299] FS:  00007f7b87b02540(0000) GS:ffff8fc5cf400000(0000) knlGS:0000000000000000
[  105.236935] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  105.239040] CR2: 00007f7b87937ff8 CR3: 0000000bf1fb8005 CR4: 0000000000160ee0
Segmentation fault

--- clip ---

The reason why this reproduces the BUG is because on the server side, the 
reconnect triggers an association restart, which moves the association to 
ESTABLISHED state, but leaves the T2 timer running. After that, close() 
triggers a new shutdown and an attempt to start the T2 timer again.

With your patch, the BUG is not reproduced.

Moving the association to ESTABLISHED state even though the socket is 
already closed is in itself a problem. I have a simple patch for that 
ready to be posted, but your patch should be handled first.

> 
> Signed-off-by: Neil Horman <nhorman@xxxxxxxxxxxxx>
> CC: Vlad Yasevich <vyasevich@xxxxxxxxx>
> CC: "David S. Miller" <davem@xxxxxxxxxxxxx>
> 
> Change notes:
> 
> v2) Removed WARN_ON, as its not useful
> 
> v3) Actually commit my changes
> 
> v4) Reduce code size by rolling expiration update into mod_timer line
> ---
>  net/sctp/sm_sideeffect.c | 18 +++++++++++++++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
> index b5327bb..192ec99 100644
> --- a/net/sctp/sm_sideeffect.c
> +++ b/net/sctp/sm_sideeffect.c
> @@ -1480,9 +1480,21 @@ static int sctp_cmd_interpreter(sctp_event_t event_type,
>  			timeout = asoc->timeouts[cmd->obj.to];
>  			BUG_ON(!timeout);
>  
> -			timer->expires = jiffies + timeout;
> -			sctp_association_hold(asoc);
> -			add_timer(timer);
> +			/*
> +			 * SCTP has a hard time with timer starts.  Because we process
> +			 * timer starts as side effects, it can be hard to tell if we
> +			 * have already started a timer or not, which leads to BUG
> +			 * halts when we call add_timer. So here, instead of just starting
> +			 * a timer, just determine the shorter of the two
> +			 * timeouts and mod the timer to that.
> +			 */
> +			if (timer_pending(timer)) {
> +				if (time_after(timer->expires, (jiffies + timeout)))
> +					mod_timer(timer, jiffies + timeout);
> +			} else {
> +				sctp_association_hold(asoc);
> +				mod_timer(timer, jiffies + timeout);
> +			}

Meanwhile, timer_reduce() has been added. Just call that instead?

>  			break;
>  
>  		case SCTP_CMD_TIMER_RESTART:
> -- 
> 2.5.5
> 



[Index of Archives]     [Linux Networking Development]     [Linux OMAP]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux