On 21/11/2023 17:55, Boris Brezillon wrote: > On Tue, 21 Nov 2023 17:11:42 +0100 > AngeloGioacchino Del Regno <angelogioacchino.delregno@xxxxxxxxxxxxx> > wrote: > >> Il 21/11/23 16:34, Krzysztof Kozlowski ha scritto: >>> On 08/11/2023 14:20, Steven Price wrote: >>>> On 02/11/2023 14:15, AngeloGioacchino Del Regno wrote: >>>>> The layout of the registers {TILER,SHADER,L2}_PWROFF_LO, used to request >>>>> powering off cores, is the same as the {TILER,SHADER,L2}_PWRON_LO ones: >>>>> this means that in order to request poweroff of cores, we are supposed >>>>> to write a bitmask of cores that should be powered off! >>>>> This means that the panfrost_gpu_power_off() function has always been >>>>> doing nothing. >>>>> >>>>> Fix powering off the GPU by writing a bitmask of the cores to poweroff >>>>> to the relevant PWROFF_LO registers and then check that the transition >>>>> (from ON to OFF) has finished by polling the relevant PWRTRANS_LO >>>>> registers. >>>>> >>>>> While at it, in order to avoid code duplication, move the core mask >>>>> logic from panfrost_gpu_power_on() to a new panfrost_get_core_mask() >>>>> function, used in both poweron and poweroff. >>>>> >>>>> Fixes: f3ba91228e8e ("drm/panfrost: Add initial panfrost driver") >>>>> Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@xxxxxxxxxxxxx> >>> >>> >>> Hi, >>> >>> This commit was added to next recently but it causes "external abort on >>> non-linefetch" during boot of my Odroid HC1 board. >>> >>> At least bisect points to it. >>> >>> If fixed, please add: >>> >>> Reported-by: Krzysztof Kozlowski <krzysztof.kozlowski@xxxxxxxxxx> >>> >>> [ 4.861683] 8<--- cut here --- >>> [ 4.863429] Unhandled fault: external abort on non-linefetch (0x1008) at 0xf0c8802c >>> [ 4.871018] [f0c8802c] *pgd=433ed811, *pte=11800653, *ppte=11800453 >>> ... >>> [ 5.164010] panfrost_gpu_irq_handler from __handle_irq_event_percpu+0xcc/0x31c >>> [ 5.171276] __handle_irq_event_percpu from handle_irq_event+0x38/0x80 >>> [ 5.177765] handle_irq_event from handle_fasteoi_irq+0x9c/0x250 >>> [ 5.183743] handle_fasteoi_irq from generic_handle_domain_irq+0x28/0x38 >>> [ 5.190417] generic_handle_domain_irq from gic_handle_irq+0x88/0xa8 >>> [ 5.196741] gic_handle_irq from generic_handle_arch_irq+0x34/0x44 >>> [ 5.202893] generic_handle_arch_irq from __irq_svc+0x8c/0xd0 >>> >>> Full log: >>> https://krzk.eu/#/builders/21/builds/4392/steps/11/logs/serial0 >>> >> >> Hey Krzysztof, >> >> This is interesting. It might be about the cores that are missing from the partial >> core_mask raising interrupts, but an external abort on non-linefetch is strange to >> see here. > > I've seen such external aborts in the past, and the fault type has > often been misleading. It's unlikely to have anything to do with a Yeah, often accessing device with power or clocks gated. > non-linefetch access, but it might be caused by a register access after > the clock or power domain driving the register bank has been disabled. > The following diff might help validate this theory. If that works, we > probably want to make sure we synchronize IRQs before disabling in the > suspend path. > > --->8--- > diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h b/drivers/gpu/drm/panfrost/panfrost_regs.h > index 55ec807550b3..98df66e5cc9b 100644 > --- a/drivers/gpu/drm/panfrost/panfrost_regs.h > +++ b/drivers/gpu/drm/panfrost/panfrost_regs.h > @@ -34,8 +34,6 @@ > (GPU_IRQ_FAULT |\ > GPU_IRQ_MULTIPLE_FAULT |\ > GPU_IRQ_RESET_COMPLETED |\ > - GPU_IRQ_POWER_CHANGED |\ > - GPU_IRQ_POWER_CHANGED_ALL |\ This helped, at least for this issue (next-20231121). Much later in user-space boot I have lockups: watchdog: BUG: soft lockup - CPU#4 stuck for 26s! [kworker/4:1:61] [ 56.329224] smp_call_function_single from __sync_rcu_exp_select_node_cpus+0x29c/0x78c [ 56.337111] __sync_rcu_exp_select_node_cpus from sync_rcu_exp_select_cpus+0x334/0x878 [ 56.344995] sync_rcu_exp_select_cpus from wait_rcu_exp_gp+0xc/0x18 [ 56.351231] wait_rcu_exp_gp from process_one_work+0x20c/0x620 [ 56.357038] process_one_work from worker_thread+0x1d0/0x488 [ 56.362668] worker_thread from kthread+0x104/0x138 [ 56.367521] kthread from ret_from_fork+0x14/0x28 But anyway the external abort does not appear. Best regards, Krzysztof