Ok under heavy load patch https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/patch/?id=94802151894d482e82c324edf2c658f8e6b96508 cause following error once in few minutes, it doesn't happen instantly Nov 27 09:00:06 [kernel] [ 5530.333416] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 Nov 27 09:00:06 [kernel] [ 5530.333669] IP: xfrm_output_resume+0x211/0x460 Nov 27 09:00:06 [kernel] [ 5530.333817] PGD 0 P4D 0 Nov 27 09:00:06 [kernel] [ 5530.333959] Oops: 0000 [#3] SMP Nov 27 09:00:06 [kernel] [ 5530.334104] CPU: 0 PID: 10893 Comm: stunnel Tainted: G D 4.14.2-gentoo #2 Nov 27 09:00:06 [kernel] [ 5530.334343] Hardware name: Supermicro H8DGU/H8DGU, BIOS 3.5c 03/18/2016 Nov 27 09:00:06 [kernel] [ 5530.334498] task: ffff918aa11ce580 task.stack: ffffad9ca678c000 Nov 27 09:00:06 [kernel] [ 5530.334667] RIP: 0010:xfrm_output_resume+0x211/0x460 Nov 27 09:00:06 [kernel] [ 5530.334815] RSP: 0018:ffffad9ca678f8d8 EFLAGS: 00010246 Nov 27 09:00:06 [kernel] [ 5530.334966] RAX: 0000000000000000 RBX: ffff918a786e32e0 RCX: 0000000000000000 Nov 27 09:00:06 [kernel] [ 5530.335122] RDX: 0000000000000020 RSI: 0000000000000000 RDI: 0000000000000000 Nov 27 09:00:06 [kernel] [ 5530.335279] RBP: ffffad9ca678f938 R08: ffffad9ca678f750 R09: ffff918acd9ccc50 Nov 27 09:00:06 [kernel] [ 5530.335436] R10: ffff918a711018d4 R11: 0000000000000000 R12: ffffffff8ab10200 Nov 27 09:00:06 [kernel] [ 5530.335591] R13: ffff918acd188c00 R14: ffff918acd188c3c R15: ffffad9ca678fb10 Nov 27 09:00:06 [kernel] [ 5530.335747] FS: 00007e2611a0b700(0000) GS:ffff918acfc00000(0000) knlGS:0000000000000000 Nov 27 09:00:06 [kernel] [ 5530.336009] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 27 09:00:06 [kernel] [ 5530.336160] CR2: 0000000000000018 CR3: 000000080b90e000 CR4: 00000000000406f0 Nov 27 09:00:06 [kernel] [ 5530.336315] Call Trace: Nov 27 09:00:06 [kernel] [ 5530.336461] ? skb_checksum+0x3f/0x60 Nov 27 09:00:06 [kernel] [ 5530.336607] ? reqsk_fastopen_remove+0x160/0x160 Nov 27 09:00:06 [kernel] [ 5530.336754] ? skb_panic+0x70/0x70 Nov 27 09:00:06 [kernel] [ 5530.336898] xfrm_output+0x91/0x210 Nov 27 09:00:06 [kernel] [ 5530.337070] ? ipv6_confirm+0xcb/0x140 Nov 27 09:00:06 [kernel] [ 5530.337215] xfrm6_output_finish+0x38/0x40 Nov 27 09:00:06 [kernel] [ 5530.337361] __xfrm6_output+0x57/0x1e0 Nov 27 09:00:06 [kernel] [ 5530.337507] xfrm6_output+0x9e/0x110 Nov 27 09:00:06 [kernel] [ 5530.337651] ? xfrm6_local_rxpmtu+0x90/0x90 Nov 27 09:00:06 [kernel] [ 5530.337798] ip6_xmit+0x29a/0x560 Nov 27 09:00:06 [kernel] [ 5530.337943] ? __ip6_append_data.isra.42+0xc10/0xc10 Nov 27 09:00:06 [kernel] [ 5530.338092] inet6_csk_xmit+0xaa/0x100 Nov 27 09:00:06 [kernel] [ 5530.338263] tcp_transmit_skb+0x57f/0xa10 Nov 27 09:00:06 [kernel] [ 5530.338411] tcp_write_xmit+0x1cd/0xf90 Nov 27 09:00:06 [kernel] [ 5530.338557] __tcp_push_pending_frames+0x3f/0xb0 Nov 27 09:00:06 [kernel] [ 5530.338704] tcp_push+0xf9/0x120 Nov 27 09:00:06 [kernel] [ 5530.338848] tcp_sendmsg_locked+0x66f/0xe30 Nov 27 09:00:06 [kernel] [ 5530.338995] tcp_sendmsg+0x3a/0x60 Nov 27 09:00:06 [kernel] [ 5530.339140] inet_sendmsg+0x3e/0xc0 Nov 27 09:00:06 [kernel] [ 5530.339285] sock_sendmsg+0x48/0x60 Nov 27 09:00:06 [kernel] [ 5530.339429] sock_write_iter+0x8f/0x100 Nov 27 09:00:06 [kernel] [ 5530.339602] new_sync_write+0x18b/0x1d0 Nov 27 09:00:06 [kernel] [ 5530.339747] __vfs_write+0x37/0x50 Nov 27 09:00:06 [kernel] [ 5530.339890] vfs_write+0xc7/0x1c0 Nov 27 09:00:06 [kernel] [ 5530.340034] SyS_write+0x5f/0xd0 Nov 27 09:00:06 [kernel] [ 5530.340178] entry_SYSCALL_64_fastpath+0x13/0x94 Nov 27 09:00:06 [kernel] [ 5530.340327] RIP: 0033:0x7e2610b6b36d Nov 27 09:00:06 [kernel] [ 5530.340480] RSP: 002b:00007e2611a0ad40 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 Nov 27 09:00:06 [kernel] [ 5530.340741] RAX: ffffffffffffffda RBX: 00007e260000efd0 RCX: 00007e2610b6b36d Nov 27 09:00:06 [kernel] [ 5530.340896] RDX: 0000000000000026 RSI: 00007e2611a46a30 RDI: 0000000000000013 Nov 27 09:00:06 [kernel] [ 5530.341051] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000 Nov 27 09:00:06 [kernel] [ 5530.341207] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000003 Nov 27 09:00:06 [kernel] [ 5530.341361] R13: 000055e303aaadf8 R14: 00007e2611a0accc R15: 00007e2611a42060 Nov 27 09:00:06 [kernel] [ 5530.341515] Code: f6 0f 8f b9 fe ff ff 31 f6 85 d2 0f 8f af fe ff ff e9 bb fe ff ff 85 f6 89 f0 0f 85 d6 fe ff ff 48 8b 7b 58 48 89 f8 48 83 e0 fe <4c> 8b 70 18 4d 85 f6 0f 84 a3 01 00 00 41 8b 86 80 00 00 00 85 Nov 27 09:00:06 [kernel] [ 5530.341929] RIP: xfrm_output_resume+0x211/0x460 RSP: ffffad9ca678f8d8 Nov 27 09:00:06 [kernel] [ 5530.342083] CR2: 0000000000000018 Nov 27 09:00:06 [kernel] [ 5530.342532] ---[ end trace a5ae59251632552d ]--- Tomas Charvat EXCELLO | Virusfree w: www.virusfree.cz e: tc@xxxxxxxxxx On 11/27/2017 10:02 AM, Steffen Klassert wrote: > On Sat, Nov 25, 2017 at 04:50:31AM +0900, David Miller wrote: >> From: Florian Westphal <fw@xxxxxxxxx> >> Date: Fri, 24 Nov 2017 20:32:12 +0100 >> >>> Tomas Charvat <tc@xxxxxxxxxx> wrote: >>> >>> [ CC stable, Steffen ] >>> >>>> Hi Florian and David, I'm running several servers that use XFRM ipsec. >>>> It do work well on all kernels bellow 4.14.0. >>>> >>>> It doesnt work on 4.14.0-2. There is no any error in dmesg or in >>>> userspace when I do configure policies. >>>> >>>> Since there is not much info about XFRM in dmesg I have no clue, where >>>> to start when I want to debug this issue. >>> David, please consider picking up >>> 94802151894d482e82c324edf2c658f8e6b96508 >>> ("Revert "xfrm: Fix stack-out-of-bounds read in xfrm_state_find.") >>> >>> for the 4.14.y stable queue. >>> >>> I think its a pretty safe bet that this fixes the problem, it broke >>> transport mode wildcard policy lookup. >> Ok, once we have confirmation that this fixes it I also need to pair >> it up with Steffen's alternative fix for the bug that commit was >> trying to fix. > We need this revert in the 4.14.y stable tree anyway as it broke > transport mode IPsec. > > I thought quite a lot about the original problem that I tried > to fix. It is a rather subtile thing, like almost all bugs > reported from syzcaller I have seen. > > In between I think our template validation is not strict enough. > It is possible to configure policies with transport mode template > where the selector address family does not match the templates > address family. The address family can not change on a transport > mode transformation, so this configuration does not make much > sense but lead to problems because we use the assumption that > the address family can not change on thransport mode later on. > > Unfortunately the reproducer provided by syzcaller does not > trigger anything on my test setup, so I don't even know if > this fixes this exact problem. > > Florian, could you please give the patch blelow a try? > > > Subject: [PATCH] xfrm: Fix stack-out-of-bounds with misconfigured transport > mode policies. > > On policies with a transport mode template, we pass the addresses > from the flowi to xfrm_state_find(), assuming that the IP addresses > (and address family) don't change during transformation. > > Unfortunately our policy template validation is not strict enough. > It is possible to configure policies with transport mode template > where the address family of the template does not match the selectors > address family. This lead to stack-out-of-bound reads because > we compare arddesses of the wrong family. Fix this by refusing > such a configuration, address family can not change on transport > mode. > > We use the assumption that, on transport mode, the first templates > address family must match the address family of the policy selector. > Subsequent transport mode templates must match the address family of > the previous template. > > Signed-off-by: Steffen Klassert <steffen.klassert@xxxxxxxxxxx> > --- > net/xfrm/xfrm_user.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c > index 983b0233767b..57ad016ae675 100644 > --- a/net/xfrm/xfrm_user.c > +++ b/net/xfrm/xfrm_user.c > @@ -1419,11 +1419,14 @@ static void copy_templates(struct xfrm_policy *xp, struct xfrm_user_tmpl *ut, > > static int validate_tmpl(int nr, struct xfrm_user_tmpl *ut, u16 family) > { > + u16 prev_family; > int i; > > if (nr > XFRM_MAX_DEPTH) > return -EINVAL; > > + prev_family = family; > + > for (i = 0; i < nr; i++) { > /* We never validated the ut->family value, so many > * applications simply leave it at zero. The check was > @@ -1435,6 +1438,12 @@ static int validate_tmpl(int nr, struct xfrm_user_tmpl *ut, u16 family) > if (!ut[i].family) > ut[i].family = family; > > + if ((ut[i].mode == XFRM_MODE_TRANSPORT) && > + (ut[i].family != prev_family)) > + return -EINVAL; > + > + prev_family = ut[i].family; > + > switch (ut[i].family) { > case AF_INET: > break;
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature