On Fri, Mar 04, 2016 at 03:06:31PM -0500, Bob Copeland wrote: > Hmm, none that I can think of, other than mesh support. I wrote this to > test some code refactoring I was doing in this area, but it should have > also worked before the changes. > > I guess it could fail if the 'iw' command didn't work as expected, or if > there's a timing issue and the gate announcements aren't received. At least iw is not printing out any errors when it gets executed during the test. > With today's wireless-testing, it's passing for me (the "+" is just for > some whitespace fixes that aren't upstream yet): > > [ 0.000000] Linux version 4.5.0-rc6-wt+ (bob@glass) (gcc version 5.3.1 20160101 (Debian 5.3.1-5) ) #14 SMP PREEMPT Fri Mar 4 14:49:21 EST 2016 > > Got a different kernel version or config I should try? > > My config is here: http://bobcopeland.com/srcs/vmconfig.2016-03-04.txt I tried with the current wireless-testing.git snapshot and with the test case multiple test frames: # wait for gate announcement frames time.sleep(1) # data frame from dev2 -> external sta should be sent to both gates dev[2].request("DATA_TEST_CONFIG 1") dev[2].request("DATA_TEST_TX {} {} 0".format(external_sta, addr2)) dev[2].request("DATA_TEST_CONFIG 0") time.sleep(1) dev[2].request("DATA_TEST_CONFIG 1") dev[2].request("DATA_TEST_TX {} {} 0".format(external_sta, addr2)) dev[2].request("DATA_TEST_CONFIG 0") time.sleep(1) dev[2].request("DATA_TEST_CONFIG 1") dev[2].request("DATA_TEST_TX {} {} 0".format(external_sta, addr2)) dev[2].request("DATA_TEST_CONFIG 0") time.sleep(1) dev[2].request("DATA_TEST_CONFIG 1") dev[2].request("DATA_TEST_TX {} {} 0".format(external_sta, addr2)) dev[2].request("DATA_TEST_CONFIG 0") time.sleep(0.1) I do see Path Request messages getting sent for 02:11:22:33:44:55. On the first attempt with this, I hit a kernel crash: [ 11.200012] Call Trace: [ 11.200012] <IRQ> [ 11.200012] [<ffffffff810687a9>] ? ttwu_do_wakeup+0x19/0xf0 [ 11.200012] [<ffffffff810691d2>] ? try_to_wake_up+0x192/0x3d0 [ 11.200012] [<ffffffff81445980>] ? mesh_nexthop_resolve+0x140/0x140 [ 11.200012] [<ffffffff81445a19>] mesh_path_timer+0x99/0x110 [ 11.200012] [<ffffffff81094705>] call_timer_fn+0x35/0x160 [ 11.200012] [<ffffffff81094a39>] run_timer_softirq+0x209/0x2a0 [ 11.200012] [<ffffffff81445980>] ? mesh_nexthop_resolve+0x140/0x140 [ 11.200012] [<ffffffff8104a9a2>] __do_softirq+0xd2/0x2b0 [ 11.200012] [<ffffffff8104ad9b>] irq_exit+0x7b/0xa0 [ 11.200012] [<ffffffff81461175>] smp_apic_timer_interrupt+0x45/0x60 [ 11.200012] [<ffffffff8145fc02>] apic_timer_interrupt+0x82/0x90 [ 11.200012] <EOI> [ 11.200012] [<ffffffff810371f6>] ? native_safe_halt+0x6/0x10 [ 11.200012] [<ffffffff8100c9ce>] default_idle+0x1e/0x100 [ 11.200012] [<ffffffff8100d22f>] arch_cpu_idle+0xf/0x20 [ 11.200012] [<ffffffff8107ac8a>] default_idle_call+0x2a/0x40 [ 11.200012] [<ffffffff8107aef3>] cpu_startup_entry+0x253/0x330 [ 11.200012] [<ffffffff8102ce23>] start_secondary+0x103/0x110 [ 11.200012] Code: 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 68 48 89 7d 80 48 8b 47 28 48 89 85 70 ff ff ff 48 8b 80 e8 09 00 00 <48> 8b 40 08 48 85 c0 48 89 85 78 ff ff ff 0f 84 d2 03 00 00 e8 [ 11.200012] RIP [<ffffffff8144191c>] mesh_path_send_to_gates+0x2c/0x480 [ 11.200012] RSP <ffff88001fd83dd0> [ 11.200012] CR2: 0000000000000008 [ 11.200012] ---[ end trace 6fdda66d273fb377 ]--- [ 11.200012] Kernel panic - not syncing: Fatal exception in interrupt This was with my work branch for the kernel with a mesh compilation warning silenced. When I tried again with unmodified master branch, I did get the test to pass, but only with that extra time added to the end. The first Data frame with the mesh extended addresses showed up at 5.6 sec offset from the beginning of the test case, i.e., much later than the 1 second wait would be able to cover. Could you please share the wpas_mesh_gate_forwarding.hwsim0.pcapng file from a test case run that shows the expected behavior? I don't see how the change I had in net/mac80211/mesh_hwmp.c could have caused the panic. All it does is initialize a variable: hwmp_preq_frame_process() - u32 orig_sn, target_sn, lifetime, target_metric; + u32 orig_sn, target_sn, lifetime, target_metric = 0; This kernel panic does not happen every time, i.e., I can pass the test case with my work branch as well. The kernel panic hit here: int mesh_path_send_to_gates(struct mesh_path *mpath) tbl = sdata->u.mesh.mesh_paths; known_gates = tbl->known_gates; The crash case looks like this: [ 8.770098] JKM:mesh_path_send_to_gates:tbl=ffff88001eb48e00 [ 11.916288] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready [ 11.931385] IPv6: ADDRCONF(NETDEV_UP): wlan1: link is not ready [ 11.946013] IPv6: ADDRCONF(NETDEV_UP): wlan2: link is not ready [ 11.970031] JKM:mesh_path_send_to_gates:tbl= (null) [ 11.971126] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 I.e., the second call to mesh_path_send_to_gates() has sdata->u.mesh.mesh_paths NULL. Is that broken somewhere else or should this function check for that NULL case to avoid the crash? When the test case passes, it happens way before that 11.9 second offset, but I'm not completely sure what causes the difference between test runs. -- Jouni Malinen PGP id EFC895FA _______________________________________________ Hostap mailing list Hostap@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/hostap