On Thu, Feb 18, 2021 at 6:16 PM Rob Herring <robh+dt@xxxxxxxxxx> wrote: > > On Thu, Feb 18, 2021 at 10:51 AM Steven Price <steven.price@xxxxxxx> wrote: > > > > On 18/02/2021 16:38, Rob Herring wrote: > > > On Thu, Feb 18, 2021 at 10:15 AM Steven Price <steven.price@xxxxxxx> wrote: > > >> > > >> On 18/02/2021 15:45, Alyssa Rosenzweig wrote: > > >>>> Yeah plus Cc: stable for backporting and I think an igt or similar for > > >>>> panfrost to check this works correctly would be pretty good too. Since > > >>>> if it took us over 1 year to notice this bug it's pretty clear that > > >>>> normal testing doesn't catch this. So very likely we'll break this > > >>>> again. > > >>> > > >>> Unfortunately there are a lot of kernel bugs which are noticed during actual > > >>> use (but not CI runs), some of which have never been fixed. I do know > > >>> the shrinker impl is buggy for us, if this is the fix I'm very happy. > > >> > > >> I doubt this will actually "fix" anything - if I understand correctly > > >> then the sequence which is broken is: > > >> > > >> * allocate BO, mmap to CPU > > >> * madvise(DONTNEED) > > >> * trigger purge > > >> * try to access the BO memory > > >> > > >> which is an invalid sequence for user space - the attempt to access > > >> memory should cause a SIGSEGV. However because drm_vma_node_unmap() is > > >> unable to find the mappings there may still be page table entries > > >> present which would provide access to memory the kernel has freed. Which > > >> is of course a big security hole and so this fix is needed. > > >> > > >> In what way do you find the shrinker impl buggy? I'm aware there's some > > >> dodgy locking (although I haven't worked out how to fix it) - but AFAICT > > >> it's more deadlock territory rather than lacking in locks. Are there > > >> correctness issues? > > > > > > What's there was largely a result of getting lockdep happy. > > > > > >>>> btw for testing shrinkers recommended way is to have a debugfs file > > >>>> that just force-shrinks everything. That way you avoid all the trouble > > >>>> that tend to happen when you drive a system close to OOM on linux, and > > >>>> it's also much faster. > > >>> > > >>> 2nding this as a good idea. > > >>> > > >> > > >> Sounds like a good idea to me too. But equally I'm wondering whether the > > >> best (short term) solution is to actually disable the shrinker. I'm > > >> somewhat surprised that nobody has got fed up with the "Purging xxx > > >> bytes" message spam - which makes me think that most people are not > > >> hitting memory pressure to trigger the shrinker. > > > > > > If the shrinker is dodgy, then it's probably good to have the messages > > > to know if it ran. > > > > > >> The shrinker on kbase caused a lot of grief - and the only way I managed > > >> to get that under control was by writing a static analysis tool for the > > >> locking, and by upsetting people by enforcing the (rather dumb) rules of > > >> the tool on the code base. I've been meaning to look at whether sparse > > >> can do a similar check of locks. > > > > > > Lockdep doesn't cover it? > > > > Short answer: no ;) It's pretty good actually, if you correctly annotate things up. > > The problem with lockdep is that you have to trigger the locking > > scenario to get a warning out of it. For example you obviously won't get > > any warnings about the shrinker without triggering the shrinker (which > > means memory pressure since we don't have the debugfs file to trigger it). > > Actually, you don't need debugfs. Writing to /proc/sys/vm/drop_caches > will do it. Though maybe there's other code path scenarios that > wouldn't cover. Huh didn't know, but it's a bit a shotgun, plus it doesn't use fs_reclaim shrinker annotations, which means you don't have lockdep checks. I think at least, would need some deadlock and testing. > > > > I have to admit I'm not 100% sure I've seen any lockdep warnings based > > on buffer objects recently. I can trigger them based on jobs: > > > > -----8<------ > > [ 265.764474] ====================================================== > > [ 265.771380] WARNING: possible circular locking dependency detected > > [ 265.778294] 5.11.0-rc2+ #22 Tainted: G W > > [ 265.784148] ------------------------------------------------------ > > [ 265.791050] kworker/0:3/90 is trying to acquire lock: > > [ 265.796694] c0d982b0 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x0/0x38 > > [ 265.804994] > > [ 265.804994] but task is already holding lock: > > [ 265.811513] c49a348c (&js->queue[j].lock){+.+.}-{3:3}, at: panfrost_reset+0x124/0x1cc [panfrost] > > [ 265.821375] > > [ 265.821375] which lock already depends on the new lock. > > [ 265.821375] > > [ 265.830524] > > [ 265.830524] the existing dependency chain (in reverse order) is: > > [ 265.838892] > > [ 265.838892] -> #2 (&js->queue[j].lock){+.+.}-{3:3}: > > [ 265.845996] mutex_lock_nested+0x18/0x20 > > [ 265.850961] panfrost_scheduler_stop+0x1c/0x94 [panfrost] > > [ 265.857590] panfrost_reset+0x54/0x1cc [panfrost] > > [ 265.863441] process_one_work+0x238/0x51c > > [ 265.868503] worker_thread+0x22c/0x2e0 > > [ 265.873270] kthread+0x128/0x138 > > [ 265.877455] ret_from_fork+0x14/0x38 > > [ 265.882028] 0x0 > > [ 265.884657] > > [ 265.884657] -> #1 (dma_fence_map){++++}-{0:0}: > > [ 265.891277] dma_resv_lockdep+0x1b4/0x290 > > [ 265.896339] do_one_initcall+0x5c/0x2e8 > > [ 265.901206] kernel_init_freeable+0x184/0x1d4 > > [ 265.906651] kernel_init+0x8/0x11c > > [ 265.911029] ret_from_fork+0x14/0x38 > > [ 265.915610] 0x0 > > [ 265.918247] > > [ 265.918247] -> #0 (fs_reclaim){+.+.}-{0:0}: > > [ 265.924579] lock_acquire+0x3a4/0x45c > > [ 265.929260] __fs_reclaim_acquire+0x28/0x38 > > [ 265.934523] slab_pre_alloc_hook.constprop.28+0x1c/0x64 > > [ 265.940948] kmem_cache_alloc_trace+0x38/0x114 > > [ 265.946493] panfrost_job_run+0x60/0x2b4 [panfrost] > > [ 265.952540] drm_sched_resubmit_jobs+0x88/0xc4 [gpu_sched] > > [ 265.959256] panfrost_reset+0x174/0x1cc [panfrost] > > [ 265.965196] process_one_work+0x238/0x51c > > [ 265.970259] worker_thread+0x22c/0x2e0 > > [ 265.975025] kthread+0x128/0x138 > > [ 265.979210] ret_from_fork+0x14/0x38 > > [ 265.983784] 0x0 > > [ 265.986412] > > [ 265.986412] other info that might help us debug this: > > [ 265.986412] > > [ 265.995353] Chain exists of: > > [ 265.995353] fs_reclaim --> dma_fence_map --> &js->queue[j].lock > > [ 265.995353] > > [ 266.007028] Possible unsafe locking scenario: > > [ 266.007028] > > [ 266.013638] CPU0 CPU1 > > [ 266.018694] ---- ---- > > [ 266.023750] lock(&js->queue[j].lock); > > [ 266.028033] lock(dma_fence_map); > > [ 266.034648] lock(&js->queue[j].lock); > > [ 266.041746] lock(fs_reclaim); > > [ 266.045252] > > [ 266.045252] *** DEADLOCK *** > > [ 266.045252] > > [ 266.051861] 4 locks held by kworker/0:3/90: > > [ 266.056530] #0: c18096a8 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x18c/0x51c > > [ 266.066258] #1: c46d5f28 ((work_completion)(&pfdev->reset.work)){+.+.}-{0:0}, at: process_one_work+0x18c/0x51c > > [ 266.077546] #2: c0dfc5a0 (dma_fence_map){++++}-{0:0}, at: panfrost_reset+0x10/0x1cc [panfrost] > > [ 266.087294] #3: c49a348c (&js->queue[j].lock){+.+.}-{3:3}, at: panfrost_reset+0x124/0x1cc [panfrost] > > [ 266.097624] > > [ 266.097624] stack backtrace: > > [ 266.102487] CPU: 0 PID: 90 Comm: kworker/0:3 Tainted: G W 5.11.0-rc2+ #22 > > [ 266.111529] Hardware name: Rockchip (Device Tree) > > [ 266.116773] Workqueue: events panfrost_reset [panfrost] > > [ 266.122628] [<c010f0c0>] (unwind_backtrace) from [<c010ad38>] (show_stack+0x10/0x14) > > [ 266.131288] [<c010ad38>] (show_stack) from [<c07c3c28>] (dump_stack+0xa4/0xd0) > > [ 266.139363] [<c07c3c28>] (dump_stack) from [<c0168760>] (check_noncircular+0x6c/0x90) > > [ 266.148116] [<c0168760>] (check_noncircular) from [<c016a2c0>] (__lock_acquire+0xe90/0x16a0) > > [ 266.157549] [<c016a2c0>] (__lock_acquire) from [<c016b5f8>] (lock_acquire+0x3a4/0x45c) > > [ 266.166399] [<c016b5f8>] (lock_acquire) from [<c0247b98>] (__fs_reclaim_acquire+0x28/0x38) > > [ 266.175637] [<c0247b98>] (__fs_reclaim_acquire) from [<c025445c>] (slab_pre_alloc_hook.constprop.28+0x1c/0x64) > > [ 266.186820] [<c025445c>] (slab_pre_alloc_hook.constprop.28) from [<c0255714>] (kmem_cache_alloc_trace+0x38/0x114) > > [ 266.198292] [<c0255714>] (kmem_cache_alloc_trace) from [<bf00d28c>] (panfrost_job_run+0x60/0x2b4 [panfrost]) > > [ 266.209295] [<bf00d28c>] (panfrost_job_run [panfrost]) from [<bf00102c>] (drm_sched_resubmit_jobs+0x88/0xc4 [gpu_sched]) > > [ 266.221476] [<bf00102c>] (drm_sched_resubmit_jobs [gpu_sched]) from [<bf00d0a0>] (panfrost_reset+0x174/0x1cc [panfrost]) > > [ 266.233649] [<bf00d0a0>] (panfrost_reset [panfrost]) from [<c0139fd4>] (process_one_work+0x238/0x51c) > > [ 266.243974] [<c0139fd4>] (process_one_work) from [<c013aaa0>] (worker_thread+0x22c/0x2e0) > > [ 266.253115] [<c013aaa0>] (worker_thread) from [<c013fd64>] (kthread+0x128/0x138) > > [ 266.261382] [<c013fd64>] (kthread) from [<c010011c>] (ret_from_fork+0x14/0x38) > > [ 266.269456] Exception stack(0xc46d5fb0 to 0xc46d5ff8) > > [ 266.275098] 5fa0: 00000000 00000000 00000000 00000000 > > [ 266.284236] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > > [ 266.293373] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000 > > -----8<------ > > > > And indeed a sleeping in atomic bug: > > -----8<----- > > [ 289.672380] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:935 > > [ 289.681210] rockchip-vop ff930000.vop: [drm:vop_crtc_atomic_flush] *ERROR* VOP vblank IRQ stuck for 10 ms > > [ 289.681880] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 31, name: kworker/0:1 > > [ 289.701901] INFO: lockdep is turned off. > > [ 289.706339] irq event stamp: 9414 > > [ 289.710095] hardirqs last enabled at (9413): [<c07cd1cc>] _raw_spin_unlock_irq+0x20/0x40 > > [ 289.719381] hardirqs last disabled at (9414): [<c07c9140>] __schedule+0x170/0x698 > > [ 289.727855] softirqs last enabled at (9112): [<c077379c>] reg_todo+0x64/0x51c > > [ 289.736044] softirqs last disabled at (9110): [<c077377c>] reg_todo+0x44/0x51c > > [ 289.744233] CPU: 0 PID: 31 Comm: kworker/0:1 Tainted: G W 5.11.0-rc2+ #22 > > [ 289.753375] Hardware name: Rockchip (Device Tree) > > [ 289.758711] Workqueue: events panfrost_reset [panfrost] > > [ 289.764886] [<c010f0c0>] (unwind_backtrace) from [<c010ad38>] (show_stack+0x10/0x14) > > [ 289.773721] [<c010ad38>] (show_stack) from [<c07c3c28>] (dump_stack+0xa4/0xd0) > > [ 289.781948] [<c07c3c28>] (dump_stack) from [<c01490f8>] (___might_sleep+0x1bc/0x244) > > [ 289.790761] [<c01490f8>] (___might_sleep) from [<c07ca720>] (__mutex_lock+0x34/0x3c4) > > [ 289.799654] [<c07ca720>] (__mutex_lock) from [<c07caac8>] (mutex_lock_nested+0x18/0x20) > > [ 289.808732] [<c07caac8>] (mutex_lock_nested) from [<bf00b704>] (panfrost_gem_free_object+0x20/0x100 [panfrost]) > > [ 289.820358] [<bf00b704>] (panfrost_gem_free_object [panfrost]) from [<bf00ccb0>] (kref_put+0x38/0x5c [panfrost]) > > [ 289.832247] [<bf00ccb0>] (kref_put [panfrost]) from [<bf00ce0c>] (panfrost_job_cleanup+0x120/0x140 [panfrost]) > > [ 289.843936] [<bf00ce0c>] (panfrost_job_cleanup [panfrost]) from [<bf00ccb0>] (kref_put+0x38/0x5c [panfrost]) > > [ 289.855435] [<bf00ccb0>] (kref_put [panfrost]) from [<c054acb8>] (dma_fence_signal_timestamp_locked+0x1c0/0x1f8) > > [ 289.867163] [<c054acb8>] (dma_fence_signal_timestamp_locked) from [<c054b1f8>] (dma_fence_signal+0x38/0x58) > > [ 289.878207] [<c054b1f8>] (dma_fence_signal) from [<bf001388>] (drm_sched_job_done+0x58/0x148 [gpu_sched]) > > [ 289.889237] [<bf001388>] (drm_sched_job_done [gpu_sched]) from [<bf001524>] (drm_sched_start+0xa4/0xd0 [gpu_sched]) > > [ 289.901389] [<bf001524>] (drm_sched_start [gpu_sched]) from [<bf00d0ac>] (panfrost_reset+0x180/0x1cc [panfrost]) > > [ 289.913286] [<bf00d0ac>] (panfrost_reset [panfrost]) from [<c0139fd4>] (process_one_work+0x238/0x51c) > > [ 289.923970] [<c0139fd4>] (process_one_work) from [<c013aaa0>] (worker_thread+0x22c/0x2e0) > > [ 289.933257] [<c013aaa0>] (worker_thread) from [<c013fd64>] (kthread+0x128/0x138) > > [ 289.941661] [<c013fd64>] (kthread) from [<c010011c>] (ret_from_fork+0x14/0x38) > > [ 289.949867] Exception stack(0xc2243fb0 to 0xc2243ff8) > > [ 289.955604] 3fa0: 00000000 00000000 00000000 00000000 > > [ 289.964840] 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > > [ 289.974066] 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000 > > -----8<---- > > > > Certainly here the mutex causing the problem is the shrinker_lock! > > > > The above is triggered by chucking a whole ton of jobs which > > fault at the GPU. > > > > Sadly I haven't found time to work out how to untangle the locks. > > They are tricky because pretty much any memory allocation can trigger > things as I recall. The above should only be possible with my dma_fence annotations, and yes the point to bugs in the drm/scheduler. They shouldn't matter for panfrost, and those patches aren't in upstream yet. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel