On Tue, Jul 11, 2023 at 12:46 AM Christian König <christian.koenig@xxxxxxx> wrote: > > Am 10.07.23 um 23:15 schrieb Luben Tuikov: > > On 2023-07-10 16:56, Rob Clark wrote: > >> From: Rob Clark <robdclark@xxxxxxxxxxxx> > >> > >> Fixes the KASAN splat: > >> > >> ================================================================== > >> BUG: KASAN: use-after-free in msm_ioctl_wait_fence+0x31c/0x7b0 > >> Read of size 4 at addr ffffff808cb7c2f8 by task syz-executor/12236 > >> CPU: 6 PID: 12236 Comm: syz-executor Tainted: G W 5.15.119-lockdep-19932-g4a017c53fe63 #1 b15455e5b94c63032dd99eb0190c27e582b357ed > >> Hardware name: Google Homestar (rev3) (DT) > >> Call trace: > >> dump_backtrace+0x0/0x4e8 > >> show_stack+0x34/0x50 > >> dump_stack_lvl+0xdc/0x11c > >> print_address_description+0x30/0x2d8 > >> kasan_report+0x178/0x1e4 > >> kasan_check_range+0x1b0/0x1b8 > >> __kasan_check_read+0x44/0x54 > >> msm_ioctl_wait_fence+0x31c/0x7b0 > >> drm_ioctl_kernel+0x214/0x418 > >> drm_ioctl+0x524/0xbe8 > >> __arm64_sys_ioctl+0x154/0x1d0 > >> invoke_syscall+0x98/0x278 > >> el0_svc_common+0x214/0x274 > >> do_el0_svc+0x9c/0x19c > >> el0_svc+0x5c/0xc0 > >> el0t_64_sync_handler+0x78/0x108 > >> el0t_64_sync+0x1a4/0x1a8 > >> Allocated by task 12224: > >> kasan_save_stack+0x38/0x68 > >> __kasan_slab_alloc+0x6c/0x88 > >> kmem_cache_alloc+0x1b8/0x428 > >> drm_sched_fence_alloc+0x30/0x94 > >> drm_sched_job_init+0x7c/0x178 > >> msm_ioctl_gem_submit+0x2b8/0x5ac4 > >> drm_ioctl_kernel+0x214/0x418 > >> drm_ioctl+0x524/0xbe8 > >> __arm64_sys_ioctl+0x154/0x1d0 > >> invoke_syscall+0x98/0x278 > >> el0_svc_common+0x214/0x274 > >> do_el0_svc+0x9c/0x19c > >> el0_svc+0x5c/0xc0 > >> el0t_64_sync_handler+0x78/0x108 > >> el0t_64_sync+0x1a4/0x1a8 > >> Freed by task 32: > >> kasan_save_stack+0x38/0x68 > >> kasan_set_track+0x28/0x3c > >> kasan_set_free_info+0x28/0x4c > >> ____kasan_slab_free+0x110/0x164 > >> __kasan_slab_free+0x18/0x28 > >> kmem_cache_free+0x1e0/0x904 > >> drm_sched_fence_free_rcu+0x80/0x9c > >> rcu_do_batch+0x318/0xcf0 > >> rcu_nocb_cb_kthread+0x1a0/0xc4c > >> kthread+0x2e4/0x3a0 > >> ret_from_fork+0x10/0x20 > >> Last potentially related work creation: > >> kasan_save_stack+0x38/0x68 > >> kasan_record_aux_stack+0xd4/0x114 > >> __call_rcu_common+0xd4/0x1478 > >> call_rcu+0x1c/0x28 > >> drm_sched_fence_release_scheduled+0x108/0x158 > >> dma_fence_release+0x178/0x564 > >> drm_sched_fence_release_finished+0xb4/0x124 > >> dma_fence_release+0x178/0x564 > >> __msm_gem_submit_destroy+0x150/0x488 > >> msm_job_free+0x9c/0xdc > >> drm_sched_main+0xec/0x9ac > >> kthread+0x2e4/0x3a0 > >> ret_from_fork+0x10/0x20 > >> Second to last potentially related work creation: > >> kasan_save_stack+0x38/0x68 > >> kasan_record_aux_stack+0xd4/0x114 > >> __call_rcu_common+0xd4/0x1478 > >> call_rcu+0x1c/0x28 > >> drm_sched_fence_release_scheduled+0x108/0x158 > >> dma_fence_release+0x178/0x564 > >> drm_sched_fence_release_finished+0xb4/0x124 > >> dma_fence_release+0x178/0x564 > >> drm_sched_entity_fini+0x170/0x238 > >> drm_sched_entity_destroy+0x34/0x44 > >> __msm_file_private_destroy+0x60/0x118 > >> msm_submitqueue_destroy+0xd0/0x110 > >> __msm_gem_submit_destroy+0x384/0x488 > >> retire_submits+0x6a8/0xa14 > >> recover_worker+0x764/0xa50 > >> kthread_worker_fn+0x3f4/0x9ec > >> kthread+0x2e4/0x3a0 > >> ret_from_fork+0x10/0x20 > >> The buggy address belongs to the object at ffffff808cb7c280 > >> The buggy address is located 120 bytes inside of > >> The buggy address belongs to the page: > >> page:000000008b01d27d refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10cb7c > >> head:000000008b01d27d order:1 compound_mapcount:0 > >> flags: 0x8000000000010200(slab|head|zone=2) > >> raw: 8000000000010200 fffffffe06844d80 0000000300000003 ffffff80860dca00 > >> raw: 0000000000000000 0000000000190019 00000001ffffffff 0000000000000000 > >> page dumped because: kasan: bad access detected > >> Memory state around the buggy address: > >> ffffff808cb7c180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> ffffff808cb7c200: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc > >> >ffffff808cb7c280: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > >> ^ > >> ffffff808cb7c300: fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc > >> ffffff808cb7c380: fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00 00 > >> ================================================================== > >> > >> Suggested-by: Alexander Potapenko <glider@xxxxxxxxxx> > >> Signed-off-by: Rob Clark <robdclark@xxxxxxxxxxxx> > >> --- > >> drivers/gpu/drm/scheduler/sched_fence.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c > >> index ef120475e7c6..b624711c6e03 100644 > >> --- a/drivers/gpu/drm/scheduler/sched_fence.c > >> +++ b/drivers/gpu/drm/scheduler/sched_fence.c > >> @@ -35,7 +35,7 @@ static int __init drm_sched_fence_slab_init(void) > >> { > >> sched_fence_slab = kmem_cache_create( > >> "drm_sched_fence", sizeof(struct drm_sched_fence), 0, > >> - SLAB_HWCACHE_ALIGN, NULL); > >> + SLAB_HWCACHE_ALIGN | SLAB_TYPESAFE_BY_RCU, NULL); > >> if (!sched_fence_slab) > >> return -ENOMEM; > >> > > Reviewed-by: Luben Tuikov <luben.tuikov@xxxxxxx> > > > > But let it simmer for 24 hours so Christian can see it too (CC-ed). > > Well that won't work like this, using the the SLAB_TYPESAFE_BY_RCU flag > was suggested before and is not allowed for dma_fence objects. > > The flag SLAB_TYPESAFE_BY_RCU can only be used if the objects in the > slab can't be reused within an RCU time period or if that reuse doesn't > matter for the logic. And exactly that is not guaranteed for dma_fence > objects. I think that is only true because of the drm_sched_fence_free() path? But that could also use call_rcu(). (It looks like it is only an error path.) > It should also not be necessary because the scheduler fences are > released using call_rcu: > > static void drm_sched_fence_release_scheduled(struct dma_fence *f) > { > struct drm_sched_fence *fence = to_drm_sched_fence(f); > > dma_fence_put(fence->parent); > call_rcu(&fence->finished.rcu, drm_sched_fence_free_rcu); > } > > That looks more like you have a reference count problem in MSM. Possibly I need to use rcu_read_lock()? But I think the idr_lock which protected dma_fence_get_rcu() (and is held until the fence is removed from fence_idr, before it's reference is dropped in __msm_gem_submit_destroy()) makes that unnecessary. BR, -R > Regards, > Christian.