On Wed, Dec 23, 2020 at 12:13 PM Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote: > > On Tue, Dec 22, 2020 at 11:32 PM John Fastabend > <john.fastabend@xxxxxxxxx> wrote: > > > > Andrii Nakryiko wrote: > > > On Tue, Dec 22, 2020 at 11:44 AM Andrii Nakryiko > > > <andrii.nakryiko@xxxxxxxxx> wrote: > > > > > > > > Hi John and Martin, > > > > > > > > I've noticed that all our CIs (libbpf and kernel-patches) started to > > > > emit kernel warning when running test_maps test. I've narrowed it down > > > > to test_sockmap() part of it. If I disable it, the warning goes away. > > > > The warning looks like this: > > > > > > > > Failed sockmap unexpected timeout So this is still happening, quite often. It would be good to take care of it, it makes test_maps quite unstable and just reduces the signal-to-noise ratio of kernel-patches CI :( > > > > [ 23.316720] ------------[ cut here ]------------ > > > > [ 23.317615] WARNING: CPU: 0 PID: 93 at net/core/stream.c:208 > > > > sk_stream_kill_queues+0x111/0x120 > > > > [ 23.319194] Modules linked in: > > > > [ 23.319698] CPU: 0 PID: 93 Comm: test_maps Not tainted > > > > 5.10.0-rc7-02263-g0e12c0271887-dirty #51 > > > > [ 23.321297] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > > > > BIOS 1.9.3-1.el7.centos 04/01/2014 > > > > [ 23.322756] RIP: 0010:sk_stream_kill_queues+0x111/0x120 > > > > [ 23.323668] Code: 00 85 c0 75 1f 85 f6 75 21 5b 5d c3 48 89 df e8 > > > > d5 fd fe ff 8b 83 68 02 00 00 8b b3 20 02 00 00 85 c0 74 e1 0f 0b 85 > > > > f6 74 df <0f> 0b 5b 5d c3 0f 0b eb ac 66 0f 1f 44 00 00 0f 1f 44 00 00 > > > > 41 57 > > > > [ 23.326251] RSP: 0018:ffffc900001d7d80 EFLAGS: 00010202 > > > > [ 23.326863] RAX: 0000000000000000 RBX: ffff88813a44a280 RCX: 0000000000000000 > > > > [ 23.327597] RDX: 0000000000000001 RSI: 0000000000000aec RDI: ffff88813a44a3d0 > > > > [ 23.328375] RBP: ffff88813a44a3d0 R08: 0000000000000000 R09: 0000000000000001 > > > > [ 23.329300] R10: 0000000000000050 R11: 0000000000000000 R12: ffff888100362200 > > > > [ 23.330281] R13: ffff88802564b8e0 R14: ffff8881006175f0 R15: 0000000000000000 > > > > [ 23.331106] FS: 00007f56532ad740(0000) GS:ffff88813bc00000(0000) > > > > knlGS:0000000000000000 > > > > [ 23.332162] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > [ 23.333011] CR2: 000000000064d448 CR3: 0000000101fd2000 CR4: 00000000000006f0 > > > > [ 23.334051] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > > > [ 23.334876] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > > > [ 23.335777] Call Trace: > > > > [ 23.336027] inet_csk_destroy_sock+0x4f/0x120 > > > > [ 23.336542] tcp_rcv_state_process+0xcee/0x1240 > > > > [ 23.337010] tcp_v4_do_rcv+0xb2/0x1e0 > > > > [ 23.337541] __release_sock+0x5c/0xf0 > > > > [ 23.337956] __tcp_close+0x1cb/0x4c0 > > > > [ 23.338351] tcp_close+0x20/0x70 > > > > [ 23.338875] inet_release+0x3c/0x70 > > > > [ 23.339466] __sock_release+0x37/0xa0 > > > > [ 23.340003] sock_close+0x14/0x20 > > > > [ 23.340580] __fput+0xb1/0x260 > > > > [ 23.341091] task_work_run+0x59/0xa0 > > > > [ 23.341685] exit_to_user_mode_prepare+0x152/0x160 > > > > [ 23.342213] syscall_exit_to_user_mode+0x38/0x240 > > > > [ 23.342758] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > [ 23.343250] RIP: 0033:0x7f565348b797 > > > > [ 23.343702] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 > > > > 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 03 00 00 > > > > 00 0f 05 <48> 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 f3 fb > > > > ff ff > > > > [ 23.345817] RSP: 002b:00007ffd178c1b68 EFLAGS: 00000246 ORIG_RAX: > > > > 0000000000000003 > > > > [ 23.346972] RAX: 0000000000000000 RBX: 00007ffd178c1be0 RCX: 00007f565348b797 > > > > [ 23.348069] RDX: 0000000000000078 RSI: 00007ffd178c1ae0 RDI: 0000000000000008 > > > > [ 23.349139] RBP: 00007ffd178c1cd0 R08: 0000000000000017 R09: 0000000000000016 > > > > [ 23.350210] R10: 0000000000000016 R11: 0000000000000246 R12: 00007ffd178c1c20 > > > > [ 23.351442] R13: 0000000000000016 R14: 000000000000000f R15: 0000000000000017 > > > > [ 23.352500] irq event stamp: 1775071 > > > > [ 23.352999] hardirqs last enabled at (1775079): > > > > [<ffffffff810d0dbf>] console_unlock+0x48f/0x590 > > > > [ 23.354360] hardirqs last disabled at (1775088): > > > > [<ffffffff810d0d27>] console_unlock+0x3f7/0x590 > > > > [ 23.355748] softirqs last enabled at (1775026): > > > > [<ffffffff81a00ebf>] asm_call_irq_on_stack+0xf/0x20 > > > > [ 23.357071] softirqs last disabled at (1775021): > > > > [<ffffffff81a00ebf>] asm_call_irq_on_stack+0xf/0x20 > > > > [ 23.358497] ---[ end trace 1fe2d145c0b7718d ]--- > > > > > > > > When trying to bisect, it turns out we get this warning even with > > > > older versions of bpf-next tree, which didn't seem to trigger this > > > > warning before. One thing that seems to have changed in our CI setup > > > > is that we went from 4 CPU VM to 2 CPU VM (due to Travis CI > > > > limitations). So that might have changed some timings and > > > > interactions. > > > > > > > > I'm dealing with other fallouts of this 4 to 2 CPU reduction and this > > > > warning itself doesn't cause the failure in libbpf CI (but it will in > > > > > > I take this back, it does return error from test_maps due to "Failed > > > sockmap unexpected timeout", so both CIs are broken, I've just > > > disabled test_maps in libbpf CI for now. And I'm sending a work-around > > > for EBUSY errors from hashmap on update/delete in a separate patch. > > > > OK thanks Andrii, I'll take a look. If its not obvious to me what the problem > > is though it might slip into the new year as I'm on PTO shortly. > > > > Yeah, no worries. I just wanted to give a heads up and get it off my > head :) Don't lose your sleep over it. > > > Also FWIW I'm sitting on some patches to start pulling the tests from > > test_sockmap into the more normal test progs framework. Then we can just > > keep test_sockmap around as a tool to test different setups or move it > > out of tree. As far as I understand it CI doesn't run test_sockmap at the > > moment. > > CI that tests all incoming patches is currently red due to this and > the EBUSY problem I worked around in [0]. Libbpf CI doesn't run > test_maps currently, so it's good. I hesitated to disable test_maps > for patches CI, though, but we can do that temporarily as well, if > necessary. > > [0] https://patchwork.kernel.org/project/netdevbpf/patch/20201223200652.3417075-1-andrii@xxxxxxxxxx/ > > > > > Thanks, > > John > > > > > > > > > kernel-patches CI), but it would be great if you can take a look at > > > > this and see if we can get it fixed soon-ish to get CI to a green > > > > state. > > > > > > > > Thanks! > > > > > > > > > > > > -- Andrii