On Wed, May 17, 2023 at 8:21 AM Naresh Kamboju <naresh.kamboju@xxxxxxxxxx> wrote: > > Linux next-20230517 build with clang nightly for i386 boot fails intermittently. Keyword: intermittently. That will make tracking this down fun. Our CI also hit a boot failure on tip/master with the same splat: https://github.com/ClangBuiltLinux/continuous-integration2/actions/runs/4998374271/jobs/8957285746 Though the CI pulled down a SHA 0932447780e1f9a43bf68ef7fe3d9b41b46d58fc which looks weird on https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=0932447780e1f9a43bf68ef7fe3d9b41b46d58fc >> Notice: this object is not reachable from any branch. That this failed in -next and -tip in the same way makes me wonder if something affecting this is coming in via -tip? Maybe the splat looks familiar to x86 folks? I haven't been able to reproduce locally when my machine is relatively load-less. If I do a kernel build in the background, I was able to get QEMU to hang, but without any splat. That was using tip/master @ f81d8f759e7f. Naresh, when you say "intermittent" do you have any data on the relative frequency of this boot failure? (Also, please make sure to use llvm@xxxxxxxxxxxxxxx in the future; we moved mailing lists years ago). Looks like our CI report linked above has an additional splat though via apply_alternatives and optimize_nops. >> [ 0.166742] Code: Unable to access opcode bytes at 0x36. Peter, that smells like perhaps either: commit b6c881b248ef ("x86/alternative: Complicate optimize_nops() some more") commit 6c480f222128 ("x86/alternative: Rewrite optimize_nops() some") Looks like BP committed them May 11; maybe just recently they were merged into tip/master? > - i386: boot/clang-nightly-lkftconfig - failed > > Please find detailed log here, > > boot log: > ===== > [ 0.000000] Linux version 6.4.0-rc2-next-20230517 (tuxmake@tuxmake) > (Debian clang version 17.0.0 > (++20230507093322+728b8a139804-1~exp1~20230507093335.652), Debian LLD > 17.0.0) #1 SMP PREEMPT_DYNAMIC @1684291209 > <6>[ 0.000000] x86/fpu: x87 FPU will use FXSAVE > <6>[ 0.000000] signal: max sigframe size: 1440 > <6>[ 0.000000] BIOS-provided physical RAM map: > <6>[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable > <6>[ 0.000000] BIOS-e820: [mem > 0x000000000009fc00-0x000000000009ffff] reserved > <6>[ 0.000000] BIOS-e820: [mem > 0x00000000000f0000-0x00000000000fffff] reserved > <6>[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000007ffdefff] usable > <6>[ 0.000000] BIOS-e820: [mem > 0x000000007ffdf000-0x000000007fffffff] reserved > <6>[ 0.000000] BIOS-e820: [mem > 0x00000000b0000000-0x00000000bfffffff] reserved > <6>[ 0.000000] BIOS-e820: [mem > 0x00000000fed1c000-0x00000000fed1ffff] reserved > <6>[ 0.000000] BIOS-e820: [mem > 0x00000000fffc0000-0x00000000ffffffff] reserved > <6>[ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000017fffffff] usable > <5>[ 0.000000] random: crng init done > <5>[ 0.000000] Notice: NX (Execute Disable) protection cannot be > enabled: non-PAE kernel! > <7>[ 0.000000] e820: update [mem 0x00cfce20-0x00cfce2f] usable ==> usable > <7>[ 0.000000] e820: update [mem 0x00cfce20-0x00cfce2f] usable ==> usable > <6>[ 0.000000] extended physical RAM map: > <6>[ 0.000000] reserve setup_data: [mem > 0x0000000000000000-0x000000000009fbff] usable > <6>[ 0.000000] reserve setup_data: [mem > 0x000000000009fc00-0x000000000009ffff] reserved > <6>[ 0.000000] reserve setup_data: [mem > 0x00000000000f0000-0x00000000000fffff] reserved > <6>[ 0.000000] reserve setup_data: [mem > 0x0000000000100000-0x0000000000cfce1f] usable > <6>[ 0.000000] reserve setup_data: [mem > 0x0000000000cfce20-0x0000000000cfce2f] usable > <6>[ 0.000000] reserve setup_data: [mem > 0x0000000000cfce30-0x000000007ffdefff] usable > <6>[ 0.000000] reserve setup_data: [mem > 0x000000007ffdf000-0x000000007fffffff] reserved > <6>[ 0.000000] reserve setup_data: [mem > 0x00000000b0000000-0x00000000bfffffff] reserved > <6>[ 0.000000] reserve setup_data: [mem > 0x00000000fed1c000-0x00000000fed1ffff] reserved > <6>[ 0.000000] reserve setup_data: [mem > 0x00000000fffc0000-0x00000000ffffffff] reserved > <6>[ 0.000000] reserve setup_data: [mem > 0x0000000100000000-0x000000017fffffff] usable > <6>[ 0.000000] SMBIOS 2.8 present. > <6>[ 0.000000] DMI: QEMU Standard PC (Q35 + ICH9, 2009), BIOS > 1.14.0-2 04/01/2014 > <6>[ 0.000000] tsc: Fast TSC calibration using PIT > <6>[ 0.000000] tsc: Detected 3000.002 MHz processor > <7>[ 0.009140] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved > <7>[ 0.009369] e820: remove [mem 0x000a0000-0x000fffff] usable > <6>[ 0.009688] last_pfn = 0x7ffdf max_arch_pfn = 0x100000 > <6>[ 0.010313] x86/PAT: PAT not supported by the CPU. > <6>[ 0.010476] x86/PAT: Configuration [0-7]: WB WT UC- UC WB WT UC- UC > <6>[ 0.028035] found SMP MP-table at [mem 0x000f5ce0-0x000f5cef] > <7>[ 0.029050] initial memory mapped: [mem 0x00000000-0x18ffffff] > <6>[ 0.032814] ACPI: Early table checksum verification disabled > <6>[ 0.033316] ACPI: RSDP 0x00000000000F5B10 000014 (v00 BOCHS ) > <6>[ 0.033820] ACPI: RSDT 0x000000007FFE2319 000038 (v01 BOCHS > BXPC 00000001 BXPC 00000001) > <6>[ 0.034477] ACPI: FACP 0x000000007FFE2109 0000F4 (v03 BOCHS > BXPC 00000001 BXPC 00000001) > <6>[ 0.035093] ACPI: DSDT 0x000000007FFE0040 0020C9 (v01 BOCHS > BXPC 00000001 BXPC 00000001) > <6>[ 0.035191] ACPI: FACS 0x000000007FFE0000 000040 > <6>[ 0.035258] ACPI: APIC 0x000000007FFE21FD 000080 (v01 BOCHS > BXPC 00000001 BXPC 00000001) > <6>[ 0.035306] ACPI: HPET 0x000000007FFE227D 000038 (v01 BOCHS > BXPC 00000001 BXPC 00000001) > <6>[ 0.035336] ACPI: MCFG 0x000000007FFE22B5 00003C (v01 BOCHS > BXPC 00000001 BXPC 00000001) > <6>[ 0.035365] ACPI: WAET 0x000000007FFE22F1 000028 (v01 BOCHS > BXPC 00000001 BXPC 00000001) > <6>[ 0.035470] ACPI: Reserving FACP table memory at [mem > 0x7ffe2109-0x7ffe21fc] > <6>[ 0.035522] ACPI: Reserving DSDT table memory at [mem > 0x7ffe0040-0x7ffe2108] > <6>[ 0.035533] ACPI: Reserving FACS table memory at [mem > 0x7ffe0000-0x7ffe003f] > <6>[ 0.035542] ACPI: Reserving APIC table memory at [mem > 0x7ffe21fd-0x7ffe227c] > <6>[ 0.035550] ACPI: Reserving HPET table memory at [mem > 0x7ffe227d-0x7ffe22b4] > <6>[ 0.035558] ACPI: Reserving MCFG table memory at [mem > 0x7ffe22b5-0x7ffe22f0] > <6>[ 0.035566] ACPI: Reserving WAET table memory at [mem > 0x7ffe22f1-0x7ffe2318] > <5>[ 0.036775] 1167MB HIGHMEM available. > <5>[ 0.036823] 879MB LOWMEM available. > <6>[ 0.036848] mapped low ram: 0 - 36ffe000 > <6>[ 0.036864] low ram: 0 - 36ffe000 > <6>[ 0.037465] Zone ranges: > <6>[ 0.037490] DMA [mem 0x0000000000001000-0x0000000000ffffff] > <6>[ 0.037576] Normal [mem 0x0000000001000000-0x0000000036ffdfff] > <6>[ 0.037591] HighMem [mem 0x0000000036ffe000-0x000000007ffdefff] > <6>[ 0.037610] Movable zone start for each node > <6>[ 0.037625] Early memory node ranges > <6>[ 0.037668] node 0: [mem 0x0000000000001000-0x000000000009efff] > <6>[ 0.037741] node 0: [mem 0x0000000000100000-0x000000007ffdefff] > <6>[ 0.037912] Initmem setup node 0 [mem > 0x0000000000001000-0x000000007ffdefff] > <6>[ 0.039699] On node 0, zone DMA: 1 pages in unavailable ranges > <6>[ 0.040493] On node 0, zone DMA: 97 pages in unavailable ranges > <6>[ 0.068623] Using APIC driver default > <6>[ 0.069096] ACPI: PM-Timer IO Port: 0x608 > <6>[ 0.069764] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1]) > <6>[ 0.070159] IOAPIC[0]: apic_id 0, version 32, address 0xfec00000, GSI 0-23 > <6>[ 0.070278] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) > <6>[ 0.070576] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level) > <6>[ 0.070611] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) > <6>[ 0.070722] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level) > <6>[ 0.070735] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level) > <6>[ 0.070968] ACPI: Using ACPI (MADT) for SMP configuration information > <6>[ 0.071240] smpboot: Allowing 2 CPUs, 0 hotplug CPUs > <6>[ 0.072248] PM: hibernation: Registered nosave memory: [mem > 0x00000000-0x00000fff] > <6>[ 0.072334] PM: hibernation: Registered nosave memory: [mem > 0x0009f000-0x0009ffff] > <6>[ 0.072384] PM: hibernation: Registered nosave memory: [mem > 0x000a0000-0x000effff] > <6>[ 0.072394] PM: hibernation: Registered nosave memory: [mem > 0x000f0000-0x000fffff] > <6>[ 0.072405] PM: hibernation: Registered nosave memory: [mem > 0x00cfc000-0x00cfcfff] > <6>[ 0.072417] PM: hibernation: Registered nosave memory: [mem > 0x00cfc000-0x00cfcfff] > <6>[ 0.072636] [mem 0xc0000000-0xfed1bfff] available for PCI devices > <6>[ 0.072683] Booting paravirtualized kernel on bare hardware > <6>[ 0.073201] clocksource: refined-jiffies: mask: 0xffffffff > max_cycles: 0xffffffff, max_idle_ns: 1910969940391419 ns > <6>[ 0.073674] setup_percpu: NR_CPUS:8 nr_cpumask_bits:2 > nr_cpu_ids:2 nr_node_ids:1 > <6>[ 0.075154] percpu: Embedded 31 pages/cpu s96372 r0 d30604 u126976 > <7>[ 0.075652] pcpu-alloc: s96372 r0 d30604 u126976 alloc=31*4096 > <7>[ 0.075742] pcpu-alloc: [0] 0 [0] 1 > <5>[ 0.077147] Kernel command line: console=ttyS0,115200 rootwait > root=/dev/sda debug verbose console_msg_format=syslog earlycon > <5>[ 0.078730] Unknown kernel command line parameters \"verbose\", > will be passed to user space. > <6>[ 0.080340] Dentry cache hash table entries: 131072 (order: 7, > 524288 bytes, linear) > <6>[ 0.081003] Inode-cache hash table entries: 65536 (order: 6, > 262144 bytes, linear) > <6>[ 0.082840] Built 1 zonelists, mobility grouping on. Total pages: 521957 > <6>[ 0.088082] mem auto-init: stack:all(zero), heap alloc:off, heap free:off > <6>[ 0.088308] Initializing HighMem for node 0 (00036ffe:0007ffdf) > <6>[ 0.454727] Initializing Movable for node 0 (00000000:00000000) > <6>[ 0.459790] Checking if this processor honours the WP bit even > in supervisor mode...Ok. > <6>[ 0.460945] Memory: 2046372K/2096628K available (15688K kernel > code, 2173K rwdata, 5772K rodata, 996K init, 780K bss, 50256K > reserved, 0K cma-reserved, 1195908K highmem) > <6>[ 0.466051] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1 > <6>[ 0.468048] ftrace: allocating 54241 entries in 106 pages > <6>[ 0.511020] ftrace: allocated 106 pages with 4 groups > <6>[ 0.514727] trace event string verifier disabled > <6>[ 0.519501] Dynamic Preempt: voluntary > <6>[ 0.522734] rcu: Preemptible hierarchical RCU implementation. > <6>[ 0.522756] rcu: RCU event tracing is enabled. > <6>[ 0.522785] rcu: RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=2. > <6>[ 0.522916] Trampoline variant of Tasks RCU enabled. > <6>[ 0.522927] Rude variant of Tasks RCU enabled. > <6>[ 0.523017] rcu: RCU calculated value of scheduler-enlistment > delay is 100 jiffies. > <6>[ 0.523045] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2 > <6>[ 0.538503] NR_IRQS: 2304, nr_irqs: 440, preallocated irqs: 16 > <6>[ 0.543029] rcu: srcu_init: Setting srcu_struct sizes based on contention. > <6>[ 0.546346] kfence: initialized - using 2097152 bytes for 255 > objects at 0x(ptrval)-0x(ptrval) > <6>[ 0.551378] Console: colour VGA+ 80x25 > <6>[ 0.553103] printk: console [ttyS0] enabled > <6>[ 0.626471] ACPI: Core revision 20230331 > <6>[ 0.637882] APIC: Switch to symmetric I/O mode setup > <6>[ 0.638348] Enabling APIC mode: Flat. Using 1 I/O APICs > <6>[ 0.644118] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 > <6>[ 0.650790] clocksource: tsc-early: mask: 0xffffffffffffffff > max_cycles: 0x2b3e46fa93c, max_idle_ns: 440795237232 ns > <6>[ 0.652683] Calibrating delay loop (skipped), value calculated > using timer frequency.. 6000.00 BogoMIPS (lpj=3000002) > <6>[ 0.654599] pid_max: default: 32768 minimum: 301 > <6>[ 0.656733] LSM: initializing lsm=capability,selinux,integrity > <6>[ 0.657713] SELinux: Initializing. > <6>[ 0.660732] Mount-cache hash table entries: 2048 (order: 1, 8192 > bytes, linear) > <6>[ 0.661233] Mountpoint-cache hash table entries: 2048 (order: 1, > 8192 bytes, linear) > <6>[ 0.682197] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0 > <6>[ 0.682458] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0 > <6>[ 0.682670] Spectre V1 : Mitigation: usercopy/swapgs barriers > and __user pointer sanitization > <6>[ 0.684632] Spectre V2 : Mitigation: Retpolines > <6>[ 0.684930] Spectre V2 : Spectre v2 / SpectreRSB mitigation: > Filling RSB on context switch > <6>[ 0.685539] Spectre V2 : Spectre v2 / SpectreRSB : Filling RSB on VMEXIT > <4>[ 0.686934] L1TF: Kernel not compiled for PAE. No mitigation for L1TF > <6>[ 0.688097] MDS: Vulnerable: Clear CPU buffers attempted, no microcode > <6>[ 0.688548] MMIO Stale Data: Unknown: No mitigations > <0>[ 0.692458] __common_interrupt: 0.79 No irq handler for vector > <0>[ 0.692458] __common_interrupt: 0.160 No irq handler for vector > <1>[ 0.692458] BUG: unable to handle page fault for address: 000024c0 > <1>[ 0.692458] #PF: user read access in kernel mode > <1>[ 0.692458] #PF: error_code(0x0004) - not-present page > <6>[ 0.692458] *pde = 00000000 > <4>[ 0.692458] Oops: 0004 [#1] PREEMPT SMP > <4>[ 0.692458] CPU: 0 PID: 0 Comm: swapper/0 Not tainted > 6.4.0-rc2-next-20230517 #1 > <4>[ 0.692458] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), > BIOS 1.14.0-2 04/01/2014 > <4>[ 0.692458] EIP: 0x60 > <4>[ 0.692458] Code: Unable to access opcode bytes at 0x36. > <4>[ 0.692458] EAX: 00000060 EBX: 00000000 ECX: b51e0945 EDX: 006afcf0 > <4>[ 0.692458] ESI: 00000060 EDI: 00000020 EBP: d86f7c04 ESP: d86f7ce0 > <4>[ 0.692458] DS: 0000 ES: 0000 FS: 0000 GS: 0000 SS: 7c8c EFLAGS: 00324b92 > <4>[ 0.692458] CR0: 80050033 CR2: 000024c0 CR3: 18a49000 CR4: 000006d0 > <4>[ 0.692458] Call Trace: > <4>[ 0.692458] ? vscnprintf+0x12/0x30 > <4>[ 0.692458] ? prb_read_valid+0x31/0x50 > <4>[ 0.692458] ? console_unlock+0x94/0xc0 > <4>[ 0.692458] ? vprintk_emit+0xf4/0x230 > <4>[ 0.692458] ? vprintk_emit+0xf4/0x230 > <4>[ 0.692458] ? vprintk_default+0x15/0x20 > <4>[ 0.692458] ? __is_insn_slot_addr+0x41/0x50 > <4>[ 0.692458] ? kernel_text_address+0x7c/0x90 > <4>[ 0.692458] ? __kernel_text_address+0x10/0x40 > <4>[ 0.692458] ? show_trace_log_lvl.17+0x1f5/0x2f0 > <4>[ 0.692458] ? __die_body+0x54/0x90 > <4>[ 0.692458] ? __die+0x70/0x80 > <4>[ 0.692458] ? page_fault_oops+0x2d3/0x310 > <4>[ 0.692458] ? kernelmode_fixup_or_oops+0xa5/0xc0 > <4>[ 0.692458] ? __bad_area_nosemaphore+0x59/0x1d0 > <4>[ 0.692458] ? bad_area_nosemaphore+0x12/0x20 > <4>[ 0.692458] ? do_user_addr_fault+0x156/0x400 > <4>[ 0.692458] ? irqentry_enter+0x32/0x38 > <4>[ 0.692458] ? trace_hardirqs_off_finish+0x2c/0x70 > <4>[ 0.692458] ? exc_page_fault+0x50/0x154 > <4>[ 0.692458] ? pvclock_clocksource_read_nowd+0xdc/0xdc > <4>[ 0.692458] ? handle_exception+0x133/0x133 > <4>[ 0.692458] ? pvclock_clocksource_read_nowd+0xdc/0xdc > <4>[ 0.692458] Modules linked in: > <4>[ 0.692458] CR2: 00000000000024c0 > <4>[ 0.692458] ---[ end trace 0000000000000000 ]--- > <4>[ 0.692458] EIP: 0x60 > <4>[ 0.692458] Code: Unable to access opcode bytes at 0x36. > <4>[ 0.692458] EAX: 00000060 EBX: 00000000 ECX: b51e0945 EDX: 006afcf0 > <4>[ 0.692458] ESI: 00000060 EDI: 00000020 EBP: d86f7c04 ESP: d86f7ce0 > <4>[ 0.692458] DS: 0000 ES: 0000 FS: 0000 GS: 0000 SS: 7c8c EFLAGS: 00324b92 > <4>[ 0.692458] CR0: 80050033 CR2: 000024c0 CR3: 18a49000 CR4: 000006d0 > <0>[ 0.692458] Kernel panic - not syncing: Attempted to kill the idle task! > <0>[ 0.692458] ---[ end Kernel panic - not syncing: Attempted to > kill the idle task! ]--- > poweroff > > > logs: > === > - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230517/testrun/17029757/suite/boot/test/clang-nightly-lkftconfig/log > - https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/lkft/tests/2PtyoY5qEJrKsuIBPRVSIGK02m3 > > > Steps to reproduce intermittently, > ============= > # To install tuxrun on your system globally: > # sudo pip3 install -U tuxrun==0.42.0 > # > # See https://tuxrun.org/ for complete documentation. > > tuxrun \ > --runtime podman \ > --device qemu-i386 \ > --kernel https://storage.tuxsuite.com/public/linaro/lkft/builds/2PtylXBvODt99O8r9YKOEwj0fQc/bzImage > \ > --modules https://storage.tuxsuite.com/public/linaro/lkft/builds/2PtylXBvODt99O8r9YKOEwj0fQc/modules.tar.xz > \ > --rootfs https://storage.tuxsuite.com/public/linaro/lkft/oebuilds/2PeQhkn36JqPowQBZge0AFhBav6/images/intel-core2-32/lkft-tux-image-intel-core2-32-20230511145016.rootfs.ext4.gz > \ > --parameters SKIPFILE=skipfile-lkft.yaml \ > --parameters SHARD_NUMBER=10 \ > --parameters SHARD_INDEX=2 \ > --image docker.io/lavasoftware/lava-dispatcher:2023.01.0020.gc1598238f \ > --tests ltp-syscalls \ > --timeouts boot=15 ltp-syscalls=50 > > -- > Linaro LKFT > https://lkft.linaro.org -- Thanks, ~Nick Desaulniers