On 08/12/2021 08:39, Thomas Hellström wrote:
On 12/8/21 09:30, Tvrtko Ursulin wrote:
...
Apart from the code organisation questions, on the practical level -
do you need writeback from the TTM backend or while I am proposing
to remove it from the "legacy" paths, I can propose removing it from
the TTM flow as well?
Yeah, if that is somehow busted then we should remove from TTM
backend also.
Okay thanks, I wanted to check in case there was an extra need in TTM.
I will float a patch soon hopefully but testing will be a problem
since it seems very hard to repro at the moment.
Do we have some information about what's causing the deadlock or a
signature? I'm asking because if some sort of shrinker was added to TTM
Yes, signature is hung task detector kicking in and pretty much system standstill ensues. You will find a bunch of tasks stuck like this:
[ 247.165578] chrome D 0 1791 1785 0x00004080
[ 247.171732] Call Trace:
[ 247.174492] __schedule+0x57e/0x10d2
[ 247.178514] ? pcpu_alloc_area+0x25d/0x273
[ 247.183115] schedule+0x7c/0x9f
[ 247.186647] rwsem_down_write_slowpath+0x2ae/0x494
[ 247.192025] register_shrinker_prepared+0x19/0x48
[ 247.197310] ? test_single_super+0x10/0x10
[ 247.201910] sget_fc+0x1fc/0x20e
[ 247.205540] ? kill_litter_super+0x40/0x40
[ 247.210139] ? proc_apply_options+0x42/0x42
[ 247.214838] vfs_get_super+0x3a/0xdf
[ 247.218855] vfs_get_tree+0x2b/0xc3
[ 247.222911] fc_mount+0x12/0x39
[ 247.226814] pid_ns_prepare_proc+0x9d/0xc5
[ 247.232468] alloc_pid+0x275/0x289
[ 247.236432] copy_process+0x5e5/0xeea
[ 247.240640] _do_fork+0x95/0x303
[ 247.244261] __se_sys_clone+0x65/0x7f
[ 247.248366] do_syscall_64+0x54/0x7e
[ 247.252365] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Or this:
[ 247.030274] minijail-init D 0 1773 1770 0x80004082
[ 247.036419] Call Trace:
[ 247.039167] __schedule+0x57e/0x10d2
[ 247.043175] ? __schedule+0x586/0x10d2
[ 247.047381] ? _raw_spin_unlock+0xe/0x20
[ 247.051779] ? __queue_work+0x316/0x371
[ 247.056079] schedule+0x7c/0x9f
[ 247.059602] rwsem_down_write_slowpath+0x2ae/0x494
[ 247.064971] unregister_shrinker+0x20/0x65
[ 247.069562] deactivate_locked_super+0x38/0x88
[ 247.074538] cleanup_mnt+0xcc/0x10e
[ 247.078447] task_work_run+0x7d/0xa6
[ 247.082459] do_exit+0x23d/0x831
[ 247.086079] ? syscall_trace_enter+0x207/0x20e
[ 247.091055] do_group_exit+0x8d/0x9d
[ 247.095062] __x64_sys_exit_group+0x17/0x17
[ 247.099750] do_syscall_64+0x54/0x7e
[ 247.103758] entry_SYSCALL_64_after_hwframe+0x44/0xa9
And smoking gun is:
[ 247.383338] CPU: 3 PID: 88 Comm: kswapd0 Tainted: G U 5.4.154 #36
[ 247.383338] Hardware name: Google Delbin/Delbin, BIOS Google_Delbin.13672.57.0 02/09/2021
[ 247.383339] RIP: 0010:__rcu_read_lock+0x0/0x1a
[ 247.383339] Code: ff ff 0f 0b e9 61 fe ff ff 0f 0b e9 76 fe ff ff 0f 0b 49 8b 44 24 20 e9 59 ff ff ff 0f 0b e9 67 ff ff ff 0f 0b e9 1b ff ff ff <0f> 1f 44 00 00 55 48 89 e5 65 48 8b 04 25 80 5d 01 00 ff 80 f8 03
[ 247.383340] RSP: 0018:ffffb0aa0031b978 EFLAGS: 00000286
[ 247.383340] RAX: 0000000000000000 RBX: fffff6b944ca8040 RCX: fffff6b944ca8001
[ 247.383341] RDX: 0000000000000028 RSI: 0000000000000001 RDI: ffff8b52bc618c18
[ 247.383341] RBP: ffffb0aa0031b9d0 R08: 0000000000000000 R09: ffff8b52fb5f00d8
[ 247.383341] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 247.383342] R13: 61c8864680b583eb R14: 0000000000000001 R15: ffffb0aa0031b980
[ 247.383342] FS: 0000000000000000(0000) GS:ffff8b52fbf80000(0000) knlGS:0000000000000000
[ 247.383343] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 247.383343] CR2: 00007c78a400d680 CR3: 0000000120f46006 CR4: 0000000000762ee0
[ 247.383344] PKRU: 55555554
[ 247.383344] Call Trace:
[ 247.383345] find_get_entry+0x4c/0x116
[ 247.383345] find_lock_entry+0xc8/0xec
[ 247.383346] shmem_writeback+0x7b/0x163
[ 247.383346] i915_gem_shrink+0x302/0x40b
[ 247.383347] ? __intel_runtime_pm_get+0x22/0x82
[ 247.383347] i915_gem_shrinker_scan+0x86/0xa8
[ 247.383348] shrink_slab+0x272/0x48b
[ 247.383348] shrink_node+0x784/0xbea
[ 247.383348] ? rcu_read_unlock_special+0x66/0x15f
[ 247.383349] ? update_batch_size+0x78/0x78
[ 247.383349] kswapd+0x75c/0xa56
[ 247.383350] kthread+0x147/0x156
[ 247.383350] ? kswapd_run+0xb6/0xb6
[ 247.383351] ? kthread_blkcg+0x2e/0x2e
[ 247.383351] ret_from_fork+0x1f/0x40
Sadly getting logs or repro from 5.16-rc is more difficult due other issues, or altogether gone, which is also a possibility. It is also possible that transparent hugepages either enable the hang, or just make it more likely.
However due history of writeback I think it sounds plausible that it is indeed unsafe. I will try to dig out a reply from Hugh Dickins who advised against doing it and I think that advice did not change, or I failed to find a later thread. There is at least a mention of that discussion in the patch which added writeback.
itself, for the TTM page vectors, it would need to allocate shmem pages
at shrink time rather than to unpin them at shrink time as we do here.
And for that to have any chance of working sort of reliably, I think
writeback is needed.
I don't claim to understand it fully, but won't the system take care of that, with the only difference being that allocation might work a little less reliably under extreme memory pressure? And I did not find other drivers use it at least which may be and indication we should indeed steer clear of it.
Regards,
Tvrtko
But I agree for this implementation, the need for writeback isn't
different than for the non-TTM shmem objects>
Thanks,
Thomas
Regards,
Tvrtko