On Tue, 9 Feb 2021 11:35:07 -0800 Brian Norris <briannorris@xxxxxxxxxxxx> wrote: > + Steven Rostedt Thanks. > > Hi Wen, > > (Trimming down the description a bit:) > > On Mon, Feb 8, 2021 at 6:59 PM Wen Gong <wgong@xxxxxxxxxxxxxx> wrote: > > > > Kernel panic every time in kernel when running below case: > > steps: > > 1. connect to an AP with good signal strength > > 2. echo 0x7f > /sys/kernel/debug/ieee80211/phy0/ath10k/pktlog_filter > > 3. echo 0xffff 0 > /sys/kernel/debug/ieee80211/phy0/ath10k/fw_dbglog > > 4. echo 0 > /sys/module/ath10k_core/parameters/debug_mask > > 5. sudo trace-cmd record -e ath10k > > 6. run "iperf -c 192.168.1.1 -u -b 100m -i 1 -t 30" > > 7. kernel panic immeditely > > > > It always crash at trace_event_raw_event_ath10k_xxx, below is 2 sample: > ... > > The value of prog in filter_match_preds of kernel/trace/trace_events_filter.c > > is overwrite to the content of the UDP packets's content like this > > 0x0039383736353433, it is a invalid address, so crash. > ... > > ath10k_htc_send_bundle_skbs allocate skb with size 49792(1556*32), it > > is bigger than PAGE_SIZE which is normally 4096, then ath10k_sdio_write > > call ath10k_dbg_dump to trace the large size skb and corrupt the trace > > data of tracing and lead crash. When disable the TX bundle of SDIO, it > > does not crash, but TX bundle is for improve throughput, so it is enabled > > by default. It is useless to call ath10k_dbg_dump to trace the large > > bundled skb, so this patch trace the top part of large bundled skb. > ... > > trace_event_raw_event_ath10k_log_dbg_dump is generated by compiler, it > > call trace_event_buffer_reserve got get a struct pointer *entry, its > > type is trace_event_raw_ath10k_log_dbg_dump which is also generated by > > compiler, trace_event_buffer_reserve of kernel/trace/trace_events.c > > call trace_event_buffer_lock_reserve to get ring_buffer_event. > > > > In function trace_event_buffer_lock_reserve of kernel/trace/trace.c, > > the ring_buffer_time_stamp_abs is false and trace_file->flags is 0x40b > > which is set bit of EVENT_FILE_FL_FILTERED by debugging, so it use the > > temp buffer this_cpu_read(trace_buffered_event), and the buffer size > > is 1 page size which allocatee in trace_buffered_event_enable by > > alloc_pages_node, and then ath10k pass the buffer size > 1 page trigger > > overflow and crash. > > > > Based on upper test, try and debugging, pass large buff size to function > > trace_ath10k_log_dbg_dump cause crash, and it has ath10k_dbg in > > ath10k_sdio_write to print the length of skb/buffer, it is not necessary > > to trace all content of the large skb. > > Is this the same issue noted in this thread? > > [for-next][PATCH 2/2] tracing: Use temp buffer when filtering events > https://lore.kernel.org/lkml/f16b14066317f6a926b6636df6974966@xxxxxxxxxxxxxx/ Note, that is only used when filtering happens, which doesn't appear to be the case here. > > It seems like we should still try to get that fixed somehow, even if > the below change is fine on its own (it probably doesn't make sense to > such a large amount of data via tracepoints). It would be unfortunate > for next poor soul to hit the same issues, just because they wanted to > dump a few KB. Yeah, it was a design decision to cap the max size of events to just under PAGE_SIZE. The ring buffer is broken up into pages (for zero copy transfers to file systems and the network). Thus, no event is allowed to be bigger than a page (and actually a bit smaller) That said, it shouldn't have crashed, it should have just failed to record. I'll test it out and see why it crashed. -- Steve