Re: [Intel-gfx] [PATCH v9 2/8] drm/i915/ttm: add tt shmem backend

Thomas Hellström <thomas.hellstrom@xxxxxxxxxxxxxxx> · Wed, 8 Dec 2021 10:32:42 +0100

On 12/8/21 10:24, Tvrtko Ursulin wrote:

On 08/12/2021 08:39, Thomas Hellström wrote:

On 12/8/21 09:30, Tvrtko Ursulin wrote:

...

Apart from the code organisation questions, on the practical level 
- do you need writeback from the TTM backend or while I am 
proposing to remove it from the "legacy" paths, I can propose 
removing it from the TTM flow as well?

Yeah, if that is somehow busted then we should remove from TTM 
backend also.

Okay thanks, I wanted to check in case there was an extra need in 
TTM. I will float a patch soon hopefully but testing will be a 
problem since it seems very hard to repro at the moment.

Do we have some information about what's causing the deadlock or a 
signature? I'm asking because if some sort of shrinker was added to TTM 

Yes, signature is hung task detector kicking in and pretty much system 
standstill ensues. You will find a bunch of tasks stuck like this:

[  247.165578] chrome          D    0  1791   1785 0x00004080
[  247.171732] Call Trace:
[  247.174492]  __schedule+0x57e/0x10d2
[  247.178514]  ? pcpu_alloc_area+0x25d/0x273
[  247.183115]  schedule+0x7c/0x9f
[  247.186647]  rwsem_down_write_slowpath+0x2ae/0x494
[  247.192025]  register_shrinker_prepared+0x19/0x48
[  247.197310]  ? test_single_super+0x10/0x10
[  247.201910]  sget_fc+0x1fc/0x20e
[  247.205540]  ? kill_litter_super+0x40/0x40
[  247.210139]  ? proc_apply_options+0x42/0x42
[  247.214838]  vfs_get_super+0x3a/0xdf
[  247.218855]  vfs_get_tree+0x2b/0xc3
[  247.222911]  fc_mount+0x12/0x39
[  247.226814]  pid_ns_prepare_proc+0x9d/0xc5
[  247.232468]  alloc_pid+0x275/0x289
[  247.236432]  copy_process+0x5e5/0xeea
[  247.240640]  _do_fork+0x95/0x303
[  247.244261]  __se_sys_clone+0x65/0x7f
[  247.248366]  do_syscall_64+0x54/0x7e
[  247.252365]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Or this:

[  247.030274] minijail-init   D    0  1773   1770 0x80004082
[  247.036419] Call Trace:
[  247.039167]  __schedule+0x57e/0x10d2
[  247.043175]  ? __schedule+0x586/0x10d2
[  247.047381]  ? _raw_spin_unlock+0xe/0x20
[  247.051779]  ? __queue_work+0x316/0x371
[  247.056079]  schedule+0x7c/0x9f
[  247.059602]  rwsem_down_write_slowpath+0x2ae/0x494
[  247.064971]  unregister_shrinker+0x20/0x65
[  247.069562]  deactivate_locked_super+0x38/0x88
[  247.074538]  cleanup_mnt+0xcc/0x10e
[  247.078447]  task_work_run+0x7d/0xa6
[  247.082459]  do_exit+0x23d/0x831
[  247.086079]  ? syscall_trace_enter+0x207/0x20e
[  247.091055]  do_group_exit+0x8d/0x9d
[  247.095062]  __x64_sys_exit_group+0x17/0x17
[  247.099750]  do_syscall_64+0x54/0x7e
[  247.103758]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

And smoking gun is:

[  247.383338] CPU: 3 PID: 88 Comm: kswapd0 Tainted: G U            
5.4.154 #36
[  247.383338] Hardware name: Google Delbin/Delbin, BIOS 
Google_Delbin.13672.57.0 02/09/2021
[  247.383339] RIP: 0010:__rcu_read_lock+0x0/0x1a
[  247.383339] Code: ff ff 0f 0b e9 61 fe ff ff 0f 0b e9 76 fe ff ff 
0f 0b 49 8b 44 24 20 e9 59 ff ff ff 0f 0b e9 67 ff ff ff 0f 0b e9 1b 
ff ff ff <0f> 1f 44 00 00 55 48 89 e5 65 48 8b 04 25 80 5d 01 00 ff 80 
f8 03
[  247.383340] RSP: 0018:ffffb0aa0031b978 EFLAGS: 00000286
[  247.383340] RAX: 0000000000000000 RBX: fffff6b944ca8040 RCX: 
fffff6b944ca8001
[  247.383341] RDX: 0000000000000028 RSI: 0000000000000001 RDI: 
ffff8b52bc618c18
[  247.383341] RBP: ffffb0aa0031b9d0 R08: 0000000000000000 R09: 
ffff8b52fb5f00d8
[  247.383341] R10: 0000000000000000 R11: 0000000000000000 R12: 
0000000000000000
[  247.383342] R13: 61c8864680b583eb R14: 0000000000000001 R15: 
ffffb0aa0031b980
[  247.383342] FS:  0000000000000000(0000) GS:ffff8b52fbf80000(0000) 
knlGS:0000000000000000
[  247.383343] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  247.383343] CR2: 00007c78a400d680 CR3: 0000000120f46006 CR4: 
0000000000762ee0
[  247.383344] PKRU: 55555554
[  247.383344] Call Trace:
[  247.383345]  find_get_entry+0x4c/0x116
[  247.383345]  find_lock_entry+0xc8/0xec
[  247.383346]  shmem_writeback+0x7b/0x163
[  247.383346]  i915_gem_shrink+0x302/0x40b
[  247.383347]  ? __intel_runtime_pm_get+0x22/0x82
[  247.383347]  i915_gem_shrinker_scan+0x86/0xa8
[  247.383348]  shrink_slab+0x272/0x48b
[  247.383348]  shrink_node+0x784/0xbea
[  247.383348]  ? rcu_read_unlock_special+0x66/0x15f
[  247.383349]  ? update_batch_size+0x78/0x78
[  247.383349]  kswapd+0x75c/0xa56
[  247.383350]  kthread+0x147/0x156
[  247.383350]  ? kswapd_run+0xb6/0xb6
[  247.383351]  ? kthread_blkcg+0x2e/0x2e
[  247.383351]  ret_from_fork+0x1f/0x40

Sadly getting logs or repro from 5.16-rc is more difficult due other 
issues, or altogether gone, which is also a possibility. It is also 
possible that transparent hugepages either enable the hang, or just 
make it more likely.

However due history of writeback I think it sounds plausible that it 
is indeed unsafe. I will try to dig out a reply from Hugh Dickins who 
advised against doing it and I think that advice did not change, or I 
failed to find a later thread. There is at least a mention of that 
discussion in the patch which added writeback.

itself, for the TTM page vectors, it would need to allocate shmem 
pages at shrink time rather than to unpin them at shrink time as we 
do here. And for that to have any chance of working sort of reliably, 
I think writeback is needed.

I don't claim to understand it fully, but won't the system take care 
of that, with the only difference being that allocation might work a 
little less reliably under extreme memory pressure? 

Yes, IIRC it's exactly this that made the previous attempt of a generic 
TTM shrinker fail.

And I did not find other drivers use it at least which may be and 
indication we should indeed steer clear of it.

You're probably right.

Regards,

Tvrtko

Thanks,

Thomas

But I agree for this implementation, the need for writeback isn't 
different than for the non-TTM shmem objects> Thanks,

Thomas

Regards,

Tvrtko