Re: [PATCH BlueZ] mesh: Offload loopback packets to l_idle_onshot()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Michał ,

> On Jan 17, 2020, at 5:34 AM, Michał Lowas-Rzechonek <michal.lowas-rzechonek@xxxxxxxxxxx> wrote:
> 
> Hi Brian,
> 
>> On 01/16, Brian Gix wrote:
>> Any packet that may be handled internally by the daemon must be sent in
>> it's own idle_oneshot context, to prevent multiple nodes from handling
>> and responding in the same context, eventually corrupting memory.
>> 
>> This addresses the following crash:
>> Program terminated with signal SIGSEGV, Segmentation fault.
>> 0  tcache_get (tc_idx=0) at malloc.c:2951
>>     2951   tcache->entries[tc_idx] = e->next;
>> (gdb) bt
>> 0  tcache_get (tc_idx=0) at malloc.c:2951
>> 1  __GI___libc_malloc (bytes=bytes@entry=16) at malloc.c:3058
>> 2  0x0000564cff9bc1de in l_malloc (size=size@entry=16) at ell/util.c:62
>> 3  0x0000564cff9bd46b in l_queue_push_tail (queue=0x564d000c9710, data=data@entry=0x564d000d0d60) at ell/queue.c:136
>> 4  0x0000564cff9beabd in idle_add (callback=callback@entry=0x564cff9be4e0 <oneshot_callback>, user_data=user_data@entry=0x564d000d4700,
>>    flags=flags@entry=268435456, destroy=destroy@entry=0x564cff9be4c0 <idle_destroy>) at ell/main.c:292
>> 5  0x0000564cff9be5f7 in l_idle_oneshot (callback=callback@entry=0x564cff998bc0 <tx_worker>, user_data=user_data@entry=0x564d000d83f0,
>>    destroy=destroy@entry=0x0) at ell/idle.c:144
>> 6  0x0000564cff998326 in send_tx (io=<optimized out>, info=0x7ffd035503f4, data=<optimized out>, len=<optimized out>)
>>    at mesh/mesh-io-generic.c:637
>> 7  0x0000564cff99675a in send_network_beacon (key=0x564d000cfee0) at mesh/net-keys.c:355
>> 8  snb_timeout (timeout=0x564d000dd730, user_data=0x564d000cfee0) at mesh/net-keys.c:364
>> 9  0x0000564cff9bdca2 in timeout_callback (fd=<optimized out>, events=<optimized out>, user_data=0x564d000dd730) at ell/timeout.c:81
>> 10 timeout_callback (fd=<optimized out>, events=<optimized out>, user_data=0x564d000dd730) at ell/timeout.c:70
>> 11 0x0000564cff9bedcd in l_main_iterate (timeout=<optimized out>) at ell/main.c:473
>> 12 0x0000564cff9bee7c in l_main_run () at ell/main.c:520
>> 13 l_main_run () at ell/main.c:502
>> 14 0x0000564cff9bf08c in l_main_run_with_signal (callback=<optimized out>, user_data=0x0) at ell/main.c:642
>> 15 0x0000564cff994b64 in main (argc=<optimized out>, argv=0x7ffd03550668) at mesh/main.c:268
> 
> Hm. I can't seem to wrap my head around this backtrace. Do you maybe
> have a reproduction path?

The backtrace doesn’t really show what has gone wrong very well, because what has happened is a heap corruption. The seg fault occurs during a memory alloc sometime later.

The physics of the problem, is best shown by local config client requesting segmented composition data from a local config server.  The one request, all response segments, the return seg ACKs all happen on the same C calling stack which gets *very* deep, and steps off the end, since nothing goes OTA. It does *not* happen during OTA operations because each discrete packet starts from a fresh C calling stack from main().

Offloading the Send Packet Requests to l_idle_oneshot ensures that each discrete loopbacked packet also starts from a known low point on the C calling stack. 

Does that make sense?




[Index of Archives]     [Bluez Devel]     [Linux Wireless Networking]     [Linux Wireless Personal Area Networking]     [Linux ATH6KL]     [Linux USB Devel]     [Linux Media Drivers]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Big List of Linux Books]

  Powered by Linux