On Thu, Mar 28, 2024 at 5:50 PM Maximilian Luz <luzmaximilian@xxxxxxxxx> wrote: > > On 3/25/24 11:03 AM, Bartosz Golaszewski wrote: > > From: Bartosz Golaszewski <bartosz.golaszewski@xxxxxxxxxx> > > > > SCM calls that take memory buffers as arguments require that they be > > page-aligned, physically continuous and non-cachable. The same > > requirements apply to the buffer used to pass additional arguments to SCM > > calls that take more than 4. > > > > To that end drivers typically use dma_alloc_coherent() to allocate memory > > of suitable format which is slow and inefficient space-wise. > > > > SHM Bridge is a safety mechanism that - once enabled - will only allow > > passing buffers to the TrustZone that have been explicitly marked as > > shared. It improves the overall system safety with SCM calls and is > > required by the upcoming scminvoke functionality. > > > > The end goal of this series is to enable SHM bridge support for those > > architectures that support it but to that end we first need to unify the > > way memory for SCM calls is allocated. This in itself is beneficial as > > the current approach of using dma_alloc_coherent() in most places is quite > > slow. > > > > First let's add a new TZ Memory allocator that allows users to create > > dynamic memory pools of format suitable for sharing with the TrustZone. > > Make it ready for implementing multiple build-time modes. > > > > Convert all relevant drivers to using it. Add separate pools for SCM core > > and for qseecom. > > > > Finally add support for SHM bridge and make it the default mode of > > operation with the generic allocator as fallback for the platforms that > > don't support SHM bridge. > > > > Tested on db410c, RB5, sm8550-qrd. Previous iteration tested also on > > sa8775p-ride and lenovo X13s (please do retest on those platforms if you > > can). > > Not sure in which version things changed (I haven't really kept up with > that, sorry), but this version (with the generic allocator selected in > the config) fails reading EFI vars on my Surface Pro X (sc8180x): > > [ 2.381020] BUG: scheduling while atomic: mount/258/0x00000002 > [ 2.383356] Modules linked in: > [ 2.385669] Preemption disabled at: > [ 2.385672] [<ffff800080f7868c>] qcom_tzmem_alloc+0x7c/0x224 > [ 2.390325] CPU: 1 PID: 258 Comm: mount Not tainted 6.8.0-1-surface-dev #2 > [ 2.392632] Hardware name: Microsoft Corporation Surface Pro X/Surface Pro X, BIOS 7.620.140 08/11/2023 > [ 2.394955] Call trace: > [ 2.397269] dump_backtrace+0x94/0x114 > [ 2.399583] show_stack+0x18/0x24 > [ 2.401883] dump_stack_lvl+0x48/0x60 > [ 2.404181] dump_stack+0x18/0x24 > [ 2.406476] __schedule_bug+0x84/0xa0 > [ 2.408770] __schedule+0x6f4/0x7fc > [ 2.411051] schedule+0x30/0xf0 > [ 2.413323] synchronize_rcu_expedited+0x158/0x1ec > [ 2.415594] lru_cache_disable+0x28/0x74 > [ 2.417853] __alloc_contig_migrate_range+0x68/0x210 > [ 2.420122] alloc_contig_range+0x140/0x280 > [ 2.422384] cma_alloc+0x128/0x404 > [ 2.424643] cma_alloc_aligned+0x44/0x6c > [ 2.426881] dma_alloc_contiguous+0x30/0x44 > [ 2.429111] __dma_direct_alloc_pages.isra.0+0x60/0x20c > [ 2.431343] dma_direct_alloc+0x6c/0x2ec > [ 2.433569] dma_alloc_attrs+0x80/0xf4 > [ 2.435786] qcom_tzmem_pool_add_memory+0x8c/0x178 > [ 2.438008] qcom_tzmem_alloc+0xe8/0x224 > [ 2.440232] qsee_uefi_get_next_variable+0x78/0x2cc > [ 2.442443] qcuefi_get_next_variable+0x50/0x94 > [ 2.444643] efivar_get_next_variable+0x20/0x2c > [ 2.446832] efivar_init+0x8c/0x29c > [ 2.449009] efivarfs_fill_super+0xd4/0xec > [ 2.451182] get_tree_single+0x74/0xbc > [ 2.453349] efivarfs_get_tree+0x18/0x24 > [ 2.455513] vfs_get_tree+0x28/0xe8 > [ 2.457680] vfs_cmd_create+0x5c/0xf4 > [ 2.459840] __do_sys_fsconfig+0x458/0x598 > [ 2.461993] __arm64_sys_fsconfig+0x24/0x30 > [ 2.464143] invoke_syscall+0x48/0x110 > [ 2.466281] el0_svc_common.constprop.0+0x40/0xe0 > [ 2.468415] do_el0_svc+0x1c/0x28 > [ 2.470546] el0_svc+0x34/0xb4 > [ 2.472669] el0t_64_sync_handler+0x120/0x12c > [ 2.474793] el0t_64_sync+0x190/0x194 > > and subsequently > > [ 2.477613] DEBUG_LOCKS_WARN_ON(val > preempt_count()) > [ 2.477618] WARNING: CPU: 4 PID: 258 at kernel/sched/core.c:5889 preempt_count_sub+0x90/0xd4 > [ 2.478682] Modules linked in: > [ 2.479214] CPU: 4 PID: 258 Comm: mount Tainted: G W 6.8.0-1-surface-dev #2 > [ 2.479752] Hardware name: Microsoft Corporation Surface Pro X/Surface Pro X, BIOS 7.620.140 08/11/2023 > [ 2.480296] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > [ 2.480839] pc : preempt_count_sub+0x90/0xd4 > [ 2.481380] lr : preempt_count_sub+0x90/0xd4 > [ 2.481917] sp : ffff8000857cbb00 > [ 2.482450] x29: ffff8000857cbb00 x28: ffff8000806b759c x27: 8000000000000005 > [ 2.482988] x26: 0000000000000000 x25: ffff0000802cbaa0 x24: ffff0000809228b0 > [ 2.483525] x23: 0000000000000000 x22: ffff800082b462f0 x21: 0000000000001000 > [ 2.484062] x20: ffff80008363d000 x19: ffff000080922880 x18: fffffffffffc9660 > [ 2.484600] x17: 0000000000000020 x16: 0000000000000000 x15: 0000000000000038 > [ 2.485137] x14: 0000000000000000 x13: ffff800082649258 x12: 00000000000006db > [ 2.485674] x11: 0000000000000249 x10: ffff8000826fc930 x9 : ffff800082649258 > [ 2.486207] x8 : 00000000ffffdfff x7 : ffff8000826f9258 x6 : 0000000000000249 > [ 2.486738] x5 : 0000000000000000 x4 : 40000000ffffe249 x3 : 0000000000000000 > [ 2.487267] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000841fa300 > [ 2.487792] Call trace: > [ 2.488311] preempt_count_sub+0x90/0xd4 > [ 2.488831] _raw_spin_unlock_irqrestore+0x1c/0x44 > [ 2.489352] qcom_tzmem_alloc+0x1cc/0x224 > [ 2.489873] qsee_uefi_get_next_variable+0x78/0x2cc > [ 2.490390] qcuefi_get_next_variable+0x50/0x94 > [ 2.490907] efivar_get_next_variable+0x20/0x2c > [ 2.491420] efivar_init+0x8c/0x29c > [ 2.491931] efivarfs_fill_super+0xd4/0xec > [ 2.492440] get_tree_single+0x74/0xbc > [ 2.492948] efivarfs_get_tree+0x18/0x24 > [ 2.493453] vfs_get_tree+0x28/0xe8 > [ 2.493957] vfs_cmd_create+0x5c/0xf4 > [ 2.494459] __do_sys_fsconfig+0x458/0x598 > [ 2.494963] __arm64_sys_fsconfig+0x24/0x30 > [ 2.495468] invoke_syscall+0x48/0x110 > [ 2.495972] el0_svc_common.constprop.0+0x40/0xe0 > [ 2.496475] do_el0_svc+0x1c/0x28 > [ 2.496976] el0_svc+0x34/0xb4 > [ 2.497475] el0t_64_sync_handler+0x120/0x12c > [ 2.497975] el0t_64_sync+0x190/0x194 > [ 2.498466] ---[ end trace 0000000000000000 ]--- > [ 2.507347] qcom_scm firmware:scm: qseecom: scm call failed with error -22 > [ 2.507813] efivars: get_next_variable: status=8000000000000007 > > If I understand correctly, it enters an atomic section in > qcom_tzmem_alloc() and then tries to schedule somewhere down the line. > So this shouldn't be qseecom specific. > > I should probably also say that I'm currently testing this on a patched > v6.8 kernel, so there's a chance that it's my fault. However, as far as > I understand, it enters an atomic section in qcom_tzmem_alloc() and then > later tries to expand the pool memory with dma_alloc_coherent(). Which > AFAIK is allowed to sleep with GFP_KERNEL (and I guess that that's the > issue here). > > I've also tried the shmem allocator option, but that seems to get stuck > quite early at boot, before I even have usb-serial access to get any > logs. If I can find some more time, I'll try to see if I can get some > useful output for that. > Ah, I think it happens here: + guard(spinlock_irqsave)(&pool->lock); + +again: + vaddr = gen_pool_alloc(pool->genpool, size); + if (!vaddr) { + if (qcom_tzmem_try_grow_pool(pool, size, gfp)) + goto again; We were called with GFP_KERNEL so this is what we pass on to qcom_tzmem_try_grow_pool() but we're now holding the spinlock. I need to revisit it. Thanks for the catch! Bart