On Thu, Feb 18, 2021 at 10:51 AM Steven Price <steven.price@xxxxxxx> wrote: > > On 18/02/2021 16:38, Rob Herring wrote: > > On Thu, Feb 18, 2021 at 10:15 AM Steven Price <steven.price@xxxxxxx> wrote: > >> > >> On 18/02/2021 15:45, Alyssa Rosenzweig wrote: > >>>> Yeah plus Cc: stable for backporting and I think an igt or similar for > >>>> panfrost to check this works correctly would be pretty good too. Since > >>>> if it took us over 1 year to notice this bug it's pretty clear that > >>>> normal testing doesn't catch this. So very likely we'll break this > >>>> again. > >>> > >>> Unfortunately there are a lot of kernel bugs which are noticed during actual > >>> use (but not CI runs), some of which have never been fixed. I do know > >>> the shrinker impl is buggy for us, if this is the fix I'm very happy. > >> > >> I doubt this will actually "fix" anything - if I understand correctly > >> then the sequence which is broken is: > >> > >> * allocate BO, mmap to CPU > >> * madvise(DONTNEED) > >> * trigger purge > >> * try to access the BO memory > >> > >> which is an invalid sequence for user space - the attempt to access > >> memory should cause a SIGSEGV. However because drm_vma_node_unmap() is > >> unable to find the mappings there may still be page table entries > >> present which would provide access to memory the kernel has freed. Which > >> is of course a big security hole and so this fix is needed. > >> > >> In what way do you find the shrinker impl buggy? I'm aware there's some > >> dodgy locking (although I haven't worked out how to fix it) - but AFAICT > >> it's more deadlock territory rather than lacking in locks. Are there > >> correctness issues? > > > > What's there was largely a result of getting lockdep happy. > > > >>>> btw for testing shrinkers recommended way is to have a debugfs file > >>>> that just force-shrinks everything. That way you avoid all the trouble > >>>> that tend to happen when you drive a system close to OOM on linux, and > >>>> it's also much faster. > >>> > >>> 2nding this as a good idea. > >>> > >> > >> Sounds like a good idea to me too. But equally I'm wondering whether the > >> best (short term) solution is to actually disable the shrinker. I'm > >> somewhat surprised that nobody has got fed up with the "Purging xxx > >> bytes" message spam - which makes me think that most people are not > >> hitting memory pressure to trigger the shrinker. > > > > If the shrinker is dodgy, then it's probably good to have the messages > > to know if it ran. > > > >> The shrinker on kbase caused a lot of grief - and the only way I managed > >> to get that under control was by writing a static analysis tool for the > >> locking, and by upsetting people by enforcing the (rather dumb) rules of > >> the tool on the code base. I've been meaning to look at whether sparse > >> can do a similar check of locks. > > > > Lockdep doesn't cover it? > > Short answer: no ;) > > The problem with lockdep is that you have to trigger the locking > scenario to get a warning out of it. For example you obviously won't get > any warnings about the shrinker without triggering the shrinker (which > means memory pressure since we don't have the debugfs file to trigger it). Actually, you don't need debugfs. Writing to /proc/sys/vm/drop_caches will do it. Though maybe there's other code path scenarios that wouldn't cover. > I have to admit I'm not 100% sure I've seen any lockdep warnings based > on buffer objects recently. I can trigger them based on jobs: > > -----8<------ > [ 265.764474] ====================================================== > [ 265.771380] WARNING: possible circular locking dependency detected > [ 265.778294] 5.11.0-rc2+ #22 Tainted: G W > [ 265.784148] ------------------------------------------------------ > [ 265.791050] kworker/0:3/90 is trying to acquire lock: > [ 265.796694] c0d982b0 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x0/0x38 > [ 265.804994] > [ 265.804994] but task is already holding lock: > [ 265.811513] c49a348c (&js->queue[j].lock){+.+.}-{3:3}, at: panfrost_reset+0x124/0x1cc [panfrost] > [ 265.821375] > [ 265.821375] which lock already depends on the new lock. > [ 265.821375] > [ 265.830524] > [ 265.830524] the existing dependency chain (in reverse order) is: > [ 265.838892] > [ 265.838892] -> #2 (&js->queue[j].lock){+.+.}-{3:3}: > [ 265.845996] mutex_lock_nested+0x18/0x20 > [ 265.850961] panfrost_scheduler_stop+0x1c/0x94 [panfrost] > [ 265.857590] panfrost_reset+0x54/0x1cc [panfrost] > [ 265.863441] process_one_work+0x238/0x51c > [ 265.868503] worker_thread+0x22c/0x2e0 > [ 265.873270] kthread+0x128/0x138 > [ 265.877455] ret_from_fork+0x14/0x38 > [ 265.882028] 0x0 > [ 265.884657] > [ 265.884657] -> #1 (dma_fence_map){++++}-{0:0}: > [ 265.891277] dma_resv_lockdep+0x1b4/0x290 > [ 265.896339] do_one_initcall+0x5c/0x2e8 > [ 265.901206] kernel_init_freeable+0x184/0x1d4 > [ 265.906651] kernel_init+0x8/0x11c > [ 265.911029] ret_from_fork+0x14/0x38 > [ 265.915610] 0x0 > [ 265.918247] > [ 265.918247] -> #0 (fs_reclaim){+.+.}-{0:0}: > [ 265.924579] lock_acquire+0x3a4/0x45c > [ 265.929260] __fs_reclaim_acquire+0x28/0x38 > [ 265.934523] slab_pre_alloc_hook.constprop.28+0x1c/0x64 > [ 265.940948] kmem_cache_alloc_trace+0x38/0x114 > [ 265.946493] panfrost_job_run+0x60/0x2b4 [panfrost] > [ 265.952540] drm_sched_resubmit_jobs+0x88/0xc4 [gpu_sched] > [ 265.959256] panfrost_reset+0x174/0x1cc [panfrost] > [ 265.965196] process_one_work+0x238/0x51c > [ 265.970259] worker_thread+0x22c/0x2e0 > [ 265.975025] kthread+0x128/0x138 > [ 265.979210] ret_from_fork+0x14/0x38 > [ 265.983784] 0x0 > [ 265.986412] > [ 265.986412] other info that might help us debug this: > [ 265.986412] > [ 265.995353] Chain exists of: > [ 265.995353] fs_reclaim --> dma_fence_map --> &js->queue[j].lock > [ 265.995353] > [ 266.007028] Possible unsafe locking scenario: > [ 266.007028] > [ 266.013638] CPU0 CPU1 > [ 266.018694] ---- ---- > [ 266.023750] lock(&js->queue[j].lock); > [ 266.028033] lock(dma_fence_map); > [ 266.034648] lock(&js->queue[j].lock); > [ 266.041746] lock(fs_reclaim); > [ 266.045252] > [ 266.045252] *** DEADLOCK *** > [ 266.045252] > [ 266.051861] 4 locks held by kworker/0:3/90: > [ 266.056530] #0: c18096a8 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x18c/0x51c > [ 266.066258] #1: c46d5f28 ((work_completion)(&pfdev->reset.work)){+.+.}-{0:0}, at: process_one_work+0x18c/0x51c > [ 266.077546] #2: c0dfc5a0 (dma_fence_map){++++}-{0:0}, at: panfrost_reset+0x10/0x1cc [panfrost] > [ 266.087294] #3: c49a348c (&js->queue[j].lock){+.+.}-{3:3}, at: panfrost_reset+0x124/0x1cc [panfrost] > [ 266.097624] > [ 266.097624] stack backtrace: > [ 266.102487] CPU: 0 PID: 90 Comm: kworker/0:3 Tainted: G W 5.11.0-rc2+ #22 > [ 266.111529] Hardware name: Rockchip (Device Tree) > [ 266.116773] Workqueue: events panfrost_reset [panfrost] > [ 266.122628] [<c010f0c0>] (unwind_backtrace) from [<c010ad38>] (show_stack+0x10/0x14) > [ 266.131288] [<c010ad38>] (show_stack) from [<c07c3c28>] (dump_stack+0xa4/0xd0) > [ 266.139363] [<c07c3c28>] (dump_stack) from [<c0168760>] (check_noncircular+0x6c/0x90) > [ 266.148116] [<c0168760>] (check_noncircular) from [<c016a2c0>] (__lock_acquire+0xe90/0x16a0) > [ 266.157549] [<c016a2c0>] (__lock_acquire) from [<c016b5f8>] (lock_acquire+0x3a4/0x45c) > [ 266.166399] [<c016b5f8>] (lock_acquire) from [<c0247b98>] (__fs_reclaim_acquire+0x28/0x38) > [ 266.175637] [<c0247b98>] (__fs_reclaim_acquire) from [<c025445c>] (slab_pre_alloc_hook.constprop.28+0x1c/0x64) > [ 266.186820] [<c025445c>] (slab_pre_alloc_hook.constprop.28) from [<c0255714>] (kmem_cache_alloc_trace+0x38/0x114) > [ 266.198292] [<c0255714>] (kmem_cache_alloc_trace) from [<bf00d28c>] (panfrost_job_run+0x60/0x2b4 [panfrost]) > [ 266.209295] [<bf00d28c>] (panfrost_job_run [panfrost]) from [<bf00102c>] (drm_sched_resubmit_jobs+0x88/0xc4 [gpu_sched]) > [ 266.221476] [<bf00102c>] (drm_sched_resubmit_jobs [gpu_sched]) from [<bf00d0a0>] (panfrost_reset+0x174/0x1cc [panfrost]) > [ 266.233649] [<bf00d0a0>] (panfrost_reset [panfrost]) from [<c0139fd4>] (process_one_work+0x238/0x51c) > [ 266.243974] [<c0139fd4>] (process_one_work) from [<c013aaa0>] (worker_thread+0x22c/0x2e0) > [ 266.253115] [<c013aaa0>] (worker_thread) from [<c013fd64>] (kthread+0x128/0x138) > [ 266.261382] [<c013fd64>] (kthread) from [<c010011c>] (ret_from_fork+0x14/0x38) > [ 266.269456] Exception stack(0xc46d5fb0 to 0xc46d5ff8) > [ 266.275098] 5fa0: 00000000 00000000 00000000 00000000 > [ 266.284236] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > [ 266.293373] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000 > -----8<------ > > And indeed a sleeping in atomic bug: > -----8<----- > [ 289.672380] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:935 > [ 289.681210] rockchip-vop ff930000.vop: [drm:vop_crtc_atomic_flush] *ERROR* VOP vblank IRQ stuck for 10 ms > [ 289.681880] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 31, name: kworker/0:1 > [ 289.701901] INFO: lockdep is turned off. > [ 289.706339] irq event stamp: 9414 > [ 289.710095] hardirqs last enabled at (9413): [<c07cd1cc>] _raw_spin_unlock_irq+0x20/0x40 > [ 289.719381] hardirqs last disabled at (9414): [<c07c9140>] __schedule+0x170/0x698 > [ 289.727855] softirqs last enabled at (9112): [<c077379c>] reg_todo+0x64/0x51c > [ 289.736044] softirqs last disabled at (9110): [<c077377c>] reg_todo+0x44/0x51c > [ 289.744233] CPU: 0 PID: 31 Comm: kworker/0:1 Tainted: G W 5.11.0-rc2+ #22 > [ 289.753375] Hardware name: Rockchip (Device Tree) > [ 289.758711] Workqueue: events panfrost_reset [panfrost] > [ 289.764886] [<c010f0c0>] (unwind_backtrace) from [<c010ad38>] (show_stack+0x10/0x14) > [ 289.773721] [<c010ad38>] (show_stack) from [<c07c3c28>] (dump_stack+0xa4/0xd0) > [ 289.781948] [<c07c3c28>] (dump_stack) from [<c01490f8>] (___might_sleep+0x1bc/0x244) > [ 289.790761] [<c01490f8>] (___might_sleep) from [<c07ca720>] (__mutex_lock+0x34/0x3c4) > [ 289.799654] [<c07ca720>] (__mutex_lock) from [<c07caac8>] (mutex_lock_nested+0x18/0x20) > [ 289.808732] [<c07caac8>] (mutex_lock_nested) from [<bf00b704>] (panfrost_gem_free_object+0x20/0x100 [panfrost]) > [ 289.820358] [<bf00b704>] (panfrost_gem_free_object [panfrost]) from [<bf00ccb0>] (kref_put+0x38/0x5c [panfrost]) > [ 289.832247] [<bf00ccb0>] (kref_put [panfrost]) from [<bf00ce0c>] (panfrost_job_cleanup+0x120/0x140 [panfrost]) > [ 289.843936] [<bf00ce0c>] (panfrost_job_cleanup [panfrost]) from [<bf00ccb0>] (kref_put+0x38/0x5c [panfrost]) > [ 289.855435] [<bf00ccb0>] (kref_put [panfrost]) from [<c054acb8>] (dma_fence_signal_timestamp_locked+0x1c0/0x1f8) > [ 289.867163] [<c054acb8>] (dma_fence_signal_timestamp_locked) from [<c054b1f8>] (dma_fence_signal+0x38/0x58) > [ 289.878207] [<c054b1f8>] (dma_fence_signal) from [<bf001388>] (drm_sched_job_done+0x58/0x148 [gpu_sched]) > [ 289.889237] [<bf001388>] (drm_sched_job_done [gpu_sched]) from [<bf001524>] (drm_sched_start+0xa4/0xd0 [gpu_sched]) > [ 289.901389] [<bf001524>] (drm_sched_start [gpu_sched]) from [<bf00d0ac>] (panfrost_reset+0x180/0x1cc [panfrost]) > [ 289.913286] [<bf00d0ac>] (panfrost_reset [panfrost]) from [<c0139fd4>] (process_one_work+0x238/0x51c) > [ 289.923970] [<c0139fd4>] (process_one_work) from [<c013aaa0>] (worker_thread+0x22c/0x2e0) > [ 289.933257] [<c013aaa0>] (worker_thread) from [<c013fd64>] (kthread+0x128/0x138) > [ 289.941661] [<c013fd64>] (kthread) from [<c010011c>] (ret_from_fork+0x14/0x38) > [ 289.949867] Exception stack(0xc2243fb0 to 0xc2243ff8) > [ 289.955604] 3fa0: 00000000 00000000 00000000 00000000 > [ 289.964840] 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > [ 289.974066] 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000 > -----8<---- > > Certainly here the mutex causing the problem is the shrinker_lock! > > The above is triggered by chucking a whole ton of jobs which > fault at the GPU. > > Sadly I haven't found time to work out how to untangle the locks. They are tricky because pretty much any memory allocation can trigger things as I recall. Rob _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel