On Wed, 02 Aug 2023 19:06:05 +0200, Curtis Malainey wrote: > > On Tue, Aug 1, 2023 at 11:42 PM Takashi Iwai <tiwai@xxxxxxx> wrote: > > > > On Tue, 01 Aug 2023 19:18:41 +0200, > > cujomalainey@xxxxxxxxxxxx wrote: > > > > > > From: Curtis Malainey <cujomalainey@xxxxxxxxxxxx> > > > > > > The current implementation of how devices are released is valid for > > > production use cases (root control of memory is handled by card_dev, all > > > other devices are no-ops). > > > > > > This model does not work though in a kernel hacking environment where > > > KASAN and delayed release on kobj is enabled. If the card_dev device is > > > released before any of the child objects a use-after-free bug is caught > > > by KASAN as the delayed release still has a reference to the devices > > > that were freed by the card_dev release. Also both snd_card and snd_pcm > > > both own two devices internally, so even if they released independently, > > > the shared struct would result in another use after free. > > > > > > Solution is to move the child devices into their own memory so they can > > > be handled independently and released on their own schedule. > > > > > > Signed-off-by: Curtis Malainey <cujomalainey@xxxxxxxxxxxx> > > > Cc: Doug Anderson <dianders@xxxxxxxxxxxx> > > > > Thanks, it's an interesting bug. > > > > I'm not much against the proposed solution, but still this has to be > > carefully evaluated. So, could you give more details about the bug > > itself? It's related with CONFIG_DEBUG_KOBJECT_RELEASE, right? > > In which condition it's triggered and how the UAF looks like? > > > Hi Takashi > > Here is one of the stack traces we are seeing (for snd_pcm) > > [ 1098.876460] ================================================================== > [ 1098.884763] BUG: KASAN: use-after-free in enqueue_timer+0xa0/0x628 > [ 1098.884849] Write of size 8 at addr ffffff80cbdc6800 by task kworker/7:4/255 > [ 1098.884888] > [ 1098.884909] CPU: 7 PID: 255 Comm: kworker/7:4 Not tainted > 5.15.117-lockdep-19614-g05bc12fd18a9 #1 > 5a757fac993273e9fde5e8de9925ee0cebe8540f > [ 1098.884961] Hardware name: Google Pompom (rev3+) with LTE (DT) > [ 1098.884990] Workqueue: events kobject_delayed_cleanup > [ 1098.885038] Call trace: > [ 1098.885059] dump_backtrace+0x0/0x4e8 > [ 1098.885095] show_stack+0x34/0x44 > [ 1098.885129] dump_stack_lvl+0xdc/0x11c > [ 1098.885165] print_address_description+0x30/0x2d8 > [ 1098.885203] kasan_report+0x178/0x1e4 > [ 1098.885237] __asan_report_store8_noabort+0x44/0x50 > [ 1098.885276] enqueue_timer+0xa0/0x628 > [ 1098.885309] internal_add_timer+0xa0/0x254 > [ 1098.885346] __mod_timer+0x588/0xc4c > [ 1098.885378] add_timer+0x74/0x94 > [ 1098.885411] __queue_delayed_work+0x170/0x208 > [ 1098.885447] queue_delayed_work_on+0x128/0x160 > [ 1098.885483] kobject_put+0x278/0x2cc > [ 1098.885517] put_device+0x30/0x48 > [ 1098.885553] snd_ctl_dev_free+0xc8/0xe4 > [ 1098.885590] __snd_device_free+0x124/0x1b8 > [ 1098.885626] snd_device_free_all+0x148/0x184 > [ 1098.885663] release_card_device+0x5c/0x180 > [ 1098.885698] device_release+0xd4/0x1b4 > [ 1098.885732] kobject_delayed_cleanup+0x130/0x304 > [ 1098.885765] process_one_work+0x620/0x137c > [ 1098.885802] worker_thread+0x43c/0xa08 > [ 1098.885837] kthread+0x2e4/0x3a0 > [ 1098.885872] ret_from_fork+0x10/0x20 > [ 1098.885907] > [ 1098.885926] Allocated by task 11891: > [ 1098.885953] kasan_save_stack+0x38/0x68 > [ 1098.885992] __kasan_kmalloc+0x90/0xac > [ 1098.886026] kmem_cache_alloc_trace+0x258/0x40c > [ 1098.886063] _snd_pcm_new+0xc4/0x2c8 > [ 1098.886098] snd_pcm_new+0x5c/0x74 > [ 1098.886132] loopback_pcm_new+0xa0/0x20c [snd_aloop] > [ 1098.886194] loopback_probe+0x210/0x850 [snd_aloop] > [ 1098.886235] platform_probe+0x150/0x1c8 > [ 1098.886273] really_probe+0x274/0xa20 > [ 1098.886308] __driver_probe_device+0x1b4/0x3b4 > [ 1098.886344] driver_probe_device+0x78/0x1c0 > [ 1098.886379] __device_attach_driver+0x1ac/0x2c8 > [ 1098.886414] bus_for_each_drv+0x11c/0x190 > [ 1098.886448] __device_attach+0x278/0x3c8 > [ 1098.886482] device_initial_probe+0x2c/0x3c > [ 1098.886517] bus_probe_device+0xbc/0x1c8 > [ 1098.886550] device_add+0x1174/0x1630 > [ 1098.886581] platform_device_add+0x3f8/0x6f8 > [ 1098.886617] platform_device_register_full+0x36c/0x3f8 > [ 1098.886656] 0xffffffc0032e3174 > [ 1098.886691] do_one_initcall+0x1e8/0x91c > [ 1098.886727] do_init_module+0x16c/0x444 > [ 1098.886762] load_module+0x63e0/0x7f8c > [ 1098.886795] __arm64_sys_finit_module+0x1e4/0x214 > [ 1098.886830] invoke_syscall+0x98/0x278 > [ 1098.886862] el0_svc_common+0x214/0x274 > [ 1098.886894] do_el0_svc+0x9c/0x19c > [ 1098.886926] el0_svc+0x5c/0xc0 > [ 1098.886959] el0t_64_sync_handler+0x78/0x108 > [ 1098.886994] el0t_64_sync+0x1a4/0x1a8 > [ 1098.887027] > [ 1098.887046] Freed by task 255: > [ 1098.887071] kasan_save_stack+0x38/0x68 > [ 1098.887106] kasan_set_track+0x28/0x3c > [ 1098.887139] kasan_set_free_info+0x28/0x4c > [ 1098.887174] ____kasan_slab_free+0x110/0x164 > [ 1098.887209] __kasan_slab_free+0x18/0x28 > [ 1098.887242] kfree+0x200/0x868 > [ 1098.887275] snd_pcm_free+0x88/0x98 > [ 1098.887311] snd_pcm_dev_free+0x48/0x5c > [ 1098.887347] __snd_device_free+0x124/0x1b8 > [ 1098.887384] snd_device_free_all+0xc8/0x184 > [ 1098.887419] release_card_device+0x5c/0x180 > [ 1098.887453] device_release+0xd4/0x1b4 > [ 1098.887486] kobject_delayed_cleanup+0x130/0x304 > [ 1098.887519] process_one_work+0x620/0x137c > [ 1098.887555] worker_thread+0x43c/0xa08 > [ 1098.887590] kthread+0x2e4/0x3a0 > [ 1098.887623] ret_from_fork+0x10/0x20 > > A similar one is generated for snd_card if card_dev.release runs > before ctl_dev.release. This patch is solving the same bug in both > places at once. Hmm. It's still puzzling, and I'm not 100% sure whether your analysis is right. The above shows it's freed by task 255, while the task hitting UAF itself is 255. It might be rather the combination of devres and delayed releases. With CONFIG_DEBUG_KOBJECT_RELEASE, the kernel should show the info of each delayed release kobject. Could you give them together with the Oops, so that we can see the flow how it hits UAF? thanks, Takashi > > You should be able to easily reproduce this if you turn on the following > > CONFIG_DEBUG_KOBJECT=y > CONFIG_DEBUG_KOBJECT_RELEASE=y > CONFIG_DEBUG_OBJECTS=y > CONFIG_DEBUG_OBJECTS_TIMERS=y > CONFIG_KASAN=y > > Then just unload and reload snd-dummy or snd-aloop in a loop. E.g. > > while true; do modprobe snd-dummy; rmmod snd-dummy; done > > The issue is that kobj should be able to be released independently of > each other, but since all of them depend on card_dev for memory > release and sometimes they even share the same allocation, it breaks > this rule. > > There is still another latent bug hiding in the system as well that I > am working on today related to devm memory release of snd_card running > before card_dev.release > > It will reproduce with the same setup even with the above patch > applied, so just an FYI if you spot it. > > [ 4972.098280] kobject: 'snd_dummy' (000000006c6d3069): > kobject_release, parent 000000009bf07dfe (delayed 2000) > [ 4972.098278] CPU: 9 PID: 16058 Comm: kworker/9:0 Tainted: G U > 6.5.0-rc3-00236-gd54aad9920bd-dirty #18 > a69e57e648f1e29627a189036c9fd8083c170225 > [ 4972.098294] Hardware name: Google Anahera/Anahera, BIOS > Google_Anahera.14505.315.0 12/02/2022 > [ 4972.098302] Workqueue: events kobject_delayed_cleanup > [ 4972.098317] Call Trace: > [ 4972.098324] <TASK> > [ 4972.098330] dump_stack_lvl+0x91/0xd0 > [ 4972.098344] print_report+0x15b/0x4d0 > [ 4972.098356] ? __virt_addr_valid+0xbb/0x130 > [ 4972.098369] ? kasan_addr_to_slab+0x11/0x80 > [ 4972.098381] ? release_card_device+0x86/0xd0 > [ 4972.098392] kasan_report+0xb1/0xf0 > [ 4972.098403] ? release_card_device+0x86/0xd0 > [ 4972.098415] ? _raw_spin_unlock_irqrestore+0x1b/0x40 > [ 4972.098427] release_card_device+0x86/0xd0 > [ 4972.098438] device_release+0x66/0x110 > [ 4972.098449] kobject_delayed_cleanup+0x133/0x140 > [ 4972.098462] process_one_work+0x3e3/0x680 > [ 4972.098474] worker_thread+0x487/0x6d0 > [ 4972.098487] ? __pfx_worker_thread+0x10/0x10 > [ 4972.098497] kthread+0x199/0x1c0 > [ 4972.098508] ? __pfx_worker_thread+0x10/0x10 > [ 4972.098518] ? __pfx_kthread+0x10/0x10 > [ 4972.098528] ret_from_fork+0x3c/0x60 > [ 4972.098540] ? __pfx_kthread+0x10/0x10 > [ 4972.098552] ret_from_fork_asm+0x1b/0x30 > [ 4972.098563] RIP: 0000:0x0 > [ 4972.098577] Code: Unable to access opcode bytes at > 0xffffffffffffffd6. > [ 4972.098584] RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: > 0000000000000000 > [ 4972.098596] RAX: 0000000000000000 RBX: 0000000000000000 RCX: > 0000000000000000 > [ 4972.098604] RDX: 0000000000000000 RSI: 0000000000000000 RDI: > 0000000000000000 > [ 4972.098612] RBP: 0000000000000000 R08: 0000000000000000 R09: > 0000000000000000 > [ 4972.098620] R10: 0000000000000000 R11: 0000000000000000 R12: > 0000000000000000 > [ 4972.098622] kobject: 'drivers' (00000000d71d1f56): kobject_release, > parent 000000007062c760 (delayed 4000) > [ 4972.098631] R13: 0000000000000000 R14: 0000000000000000 R15: > 0000000000000000 > [ 4972.098641] kobject: 'holders' (0000000091718821): kobject_release, > parent 000000007062c760 (delayed 2000) > [ 4972.098641] </TASK> > > Curtis > > > > > > > > > Takashi >