Hi Marc, On 15.09.2020 10:07, Marc Zyngier wrote: > On 2020-09-15 07:48, Marek Szyprowski wrote: >>>> Both Exynos 4210 and 4412 use non-zero cpu-offset in GIC node in >>>> device-tree: arch/arm/boot/dts/exynos{4210,4412}.dtsi, so I assume >>>> that >>>> the GIC registers are not banked. >>> >>> Annoyingly, it seems to work correctly in QEMU: > > [...] > >>> Do you happen to know whether the QEMU emulation is trustworthy? >> >> I didn't play much with Exynos emulation on QEMU. All I know is that >> this patch simply doesn't work on the real hw. > > I don't doubt it. The question was more whether we could trust QEMU > to be reliable, in which case the issue would be around a kernel > configuration problem. Could you stash your kernel config somewhere? I just use the vanilla exynos_defconfig for my tests. >> If there is anything to check or test, let me know. I will try to help >> as much as possible. > > It would be interesting to see whether the CPUs are getting any IPI. > Can you try the following patch, and send the results back? Starting kernel ... [ 0.000000] Booting Linux on physical CPU 0x900 [ 0.000000] Linux version 5.9.0-rc4-00008-gac063232d4b0-dirty (mszyprow@AMDC2765) (arm-linux-gnueabi-gcc (Linaro GCC 4.9-2017.01) 4.9.4, GNU ld (Linaro_Binutils-2017.01) 2.24.0.20141017 Linaro 2014_11-3-git) #9174 SMP PREEMPT Tue Sep 15 10:30:46 CEST 2020 [ 0.000000] CPU: ARMv7 Processor [412fc091] revision 1 (ARMv7), cr=10c5387d [ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache [ 0.000000] OF: fdt: Machine model: Samsung Trats based on Exynos4210 [ 0.000000] earlycon: exynos4210 at MMIO 0x13820000 (options '115200n8') [ 0.000000] printk: bootconsole [exynos4210] enabled [ 0.000000] Memory policy: Data cache writealloc [ 0.000000] cma: Reserved 96 MiB at 0x7a000000 [ 0.000000] Samsung CPU ID: 0x43210211 [ 0.000000] Zone ranges: [ 0.000000] Normal [mem 0x0000000040000000-0x000000006fffffff] [ 0.000000] HighMem [mem 0x0000000070000000-0x000000007fffffff] [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x0000000040000000-0x000000007fffffff] [ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x000000007fffffff] [ 0.000000] percpu: Embedded 20 pages/cpu s51904 r8192 d21824 u81920 [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 260608 [ 0.000000] Kernel command line: root=PARTLABEL=data rootwait console=tty1 console=ttySAC2,115200n8 earlycon [ 0.000000] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes, linear) [ 0.000000] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes, linear) [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off [ 0.000000] Memory: 917072K/1048576K available (10240K kernel code, 958K rwdata, 4000K rodata, 1024K init, 6487K bss, 33200K reserved, 98304K cma-reserved, 163840K highmem) [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1 [ 0.000000] Running RCU self tests [ 0.000000] rcu: Preemptible hierarchical RCU implementation. [ 0.000000] rcu: RCU event tracing is enabled. [ 0.000000] rcu: RCU lockdep checking is enabled. [ 0.000000] rcu: RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=2. [ 0.000000] Trampoline variant of Tasks RCU enabled. [ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies. [ 0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2 [ 0.000000] NR_IRQS: 16, nr_irqs: 16, preallocated irqs: 16 [ 0.000000] CPU0 IPI0 base = f0800000 [ 0.000000] CPU0 IPI1 base = f0800000 [ 0.000000] CPU0 IPI2 base = f0800000 [ 0.000000] CPU0 IPI3 base = f0800000 [ 0.000000] CPU0 IPI4 base = f0800000 [ 0.000000] CPU0 IPI5 base = f0800000 [ 0.000000] CPU0 IPI6 base = f0800000 [ 0.000000] CPU0 IPI7 base = f0800000 [ 0.000000] L2C: platform modifies aux control register: 0x02070000 -> 0x3e470000 [ 0.000000] L2C: DT/platform modifies aux control register: 0x02070000 -> 0x3e470000 [ 0.000000] L2C-310 enabling early BRESP for Cortex-A9 [ 0.000000] L2C-310 full line of zeros enabled for Cortex-A9 [ 0.000000] L2C-310 dynamic clock gating enabled, standby mode enabled [ 0.000000] L2C-310 cache controller enabled, 16 ways, 1024 kB [ 0.000000] L2C-310: CACHE_ID 0x4100c4c5, AUX_CTRL 0x4e470001 [ 0.000000] random: get_random_bytes called from start_kernel+0x4c0/0x67c with crng_init=0 [ 0.000000] Exynos4210 clocks: sclk_apll = 800000000, sclk_mpll = 800000000 [ 0.000000] sclk_epll = 96000000, sclk_vpll = 108000000, arm_clk = 800000000 [ 0.000000] Switching to timer-based delay loop, resolution 41ns [ 0.000000] clocksource: mct-frc: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 79635851949 ns [ 0.000007] sched_clock: 32 bits at 24MHz, resolution 41ns, wraps every 89478484971ns [ 0.008850] Console: colour dummy device 80x30 [ 0.017306] printk: console [tty1] enabled [ 0.019992] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar [ 0.027791] ... MAX_LOCKDEP_SUBCLASSES: 8 [ 0.031793] ... MAX_LOCK_DEPTH: 48 [ 0.035959] ... MAX_LOCKDEP_KEYS: 8192 [ 0.040327] ... CLASSHASH_SIZE: 4096 [ 0.044643] ... MAX_LOCKDEP_ENTRIES: 32768 [ 0.049096] ... MAX_LOCKDEP_CHAINS: 65536 [ 0.053497] ... CHAINHASH_SIZE: 32768 [ 0.057948] memory used by lock dependency info: 4029 kB [ 0.063318] memory used for stack traces: 2112 kB [ 0.068110] per task-struct memory footprint: 1536 bytes [ 0.073536] Calibrating delay loop (skipped), value calculated using timer frequency.. 48.00 BogoMIPS (lpj=240000) [ 0.083901] pid_max: default: 32768 minimum: 301 [ 0.088789] Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear) [ 0.095733] Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear) [ 0.106706] CPU: Testing write buffer coherency: ok [ 0.110276] CPU0: Spectre v2: using BPIALL workaround [ 0.116474] CPU0: thread -1, cpu 0, socket 9, mpidr 80000900 [ 0.124225] Setting up static identity map for 0x40100000 - 0x40100060 [ 0.130539] rcu: Hierarchical SRCU implementation. [ 0.137562] soc soc0: Exynos: CPU[EXYNOS4210] PRO_ID[0x43210211] REV[0x11] Detected [ 0.145493] smp: Bringing up secondary CPUs ... [ 0.152740] CPU0 send IPI0 base = f0800000 [ 0.152786] CPU1: Booted secondary processor [ 0.155582] CPU0 send IPI0 base = f0800000 [ 0.163945] CPU1 IPI0 base = f0808000 [ 0.163956] CPU1 IPI1 base = f0808000 [ 0.163966] CPU1 IPI2 base = f0808000 [ 0.163976] CPU1 IPI3 base = f0808000 [ 0.163986] CPU1 IPI4 base = f0808000 [ 0.163995] CPU1 IPI5 base = f0808000 [ 0.164004] CPU1 IPI6 base = f0808000 [ 0.164014] CPU1 IPI7 base = f0808000 [ 0.164025] CPU1: thread -1, cpu 1, socket 9, mpidr 80000901 [ 0.164035] CPU1: Spectre v2: using BPIALL workaround [ 0.203803] CPU1 send IPI2 base = f0808000 [ 0.207834] CPU1 IPI0 received [ 0.207839] CPU0 IPI2 received [ 0.214052] CPU0 send IPI2 base = f0800000 [ 0.217990] CPU1 IPI2 received [ 0.222188] CPU1 send IPI2 base = f0808000 [ 2.754062] random: fast init done Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland