On Tue, Jan 31, 2012 at 02:38:15PM -0800, Eric W. Biederman wrote: > Don Zickus <dzickus at redhat.com> writes: > > > On Tue, Jan 31, 2012 at 02:08:29PM -0800, Eric W. Biederman wrote: > >> > The problem is that although kdump tries to shutdown minimal hardware, > >> > it still needs to disable the IO APIC. This requires spinlocks which > >> > may be held by another cpu. This other cpu is being held infinitely in > >> > an NMI context by kdump in order to serialize the crashing path. Instant > >> > deadlock. > >> > >> Can you test to see if kexec on panic still needs to disable the IO > >> APIC. Last I looked we were close if not all of the way there to not > >> needing to boot the kernel in pic mode? > > > > Ok, so you just blindly remove disable_IO_APIC from > > native_machine_crash_shutdown and re-run some panic tests on various > > machines? What about the disable_IO_APIC path in native_machine_shutdown? > > > > Yes. Just native_machine_crash_shutdown. > > native_machine_shutdown is the case when all is good and we attempt to > put the hardware back the way we found it. Ok. > > Any normal x86 machine that the kernel runs in ioapic mode should be > enough to get a first approximation. > > > Also, where could I look to see if that work was done? Is that in the > > ioapic setup code? > > The primary question is do we call the ioapic setup code without calling > the pic setup code first. On some embedded x86 platforms we certainly > do. I don't know if that code has been generalized. > > Historically the problem is that we started the pit timer in pic mode > and used that to calibrate the delay loop. > > So what we are looking to verify is that the linux kernel boot skip > pic mode entirely. It seems to boot fine on an Ivy Bridge machine and a single cpu Pentium4. I will try and athlon3 and a nehalem tomorrow. Talking to folks here and trying to read the code it seems like the PIT stuff is delayed until after the IOAPIC is configured using Fast TSC calibration as a mechanism to work around the PIT?? I attached the output of the Pentium4 when kdumping. Not sure what to really look for to verify the PIC is being skipped. Perhaps you know? Cheers, Don DMI 2.3 present. last_pfn = 0x20000 max_arch_pfn = 0x1000000 x86 PAT enabled: cpu 0, old 0x7010600070106, new 0x7010600070106 found SMP MP-table at [c00fe710] fe710 init_memory_mapping: 0000000000000000-0000000020000000 RAMDISK: 1fab5000 - 1ff5f000 ACPI: RSDP 000fd560 00014 (v00 DELL ) ACPI: RSDT 000fd574 00034 (v01 DELL GX240 00000008 ASL 00000061) ACPI: FACP 000fd5a8 00074 (v01 DELL GX240 00000008 ASL 00000061) ACPI: DSDT fffe3c22 02393 (v01 DELL dt_ex 00001000 MSFT 0100000D) ACPI: FACS 3ff77000 00040 ACPI: SSDT fffe5fb5 000A7 (v01 DELL st_ex 00001000 MSFT 0100000D) ACPI: APIC 000fd61c 0005C (v01 DELL GX240 00000008 ASL 00000061) ACPI: BOOT 000fd678 00028 (v01 DELL GX240 00000008 ASL 00000061) 0MB HIGHMEM available. 512MB LOWMEM available. mapped low ram: 0 - 20000000 low ram: 0 - 20000000 Zone PFN ranges: DMA 0x00000010 -> 0x00001000 Normal 0x00001000 -> 0x00020000 HighMem empty Movable zone start PFN for each node Early memory PFN ranges 0: 0x00000010 -> 0x000000a0 0: 0x00018000 -> 0x0001ff6a 0: 0x0001ff6b -> 0x0001ff6f 0: 0x0001ffff -> 0x00020000 Using APIC driver default ACPI: PM-Timer IO Port: 0x808 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] disabled) ACPI: IOAPIC (id[0x01] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 1, version 32, address 0xfec00000, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) Using ACPI (MADT) for SMP configuration information 2 Processors exceeds NR_CPUS limit of 1 SMP: Allowing 1 CPUs, 0 hotplug CPUs PM: Registered nosave memory: 00000000000a0000 - 00000000000f0000 PM: Registered nosave memory: 00000000000f0000 - 0000000000100000 PM: Registered nosave memory: 0000000000100000 - 0000000018000000 PM: Registered nosave memory: 000000001ff6a000 - 000000001ff6b000 PM: Registered nosave memory: 000000001ff6f000 - 000000001ffff000 Allocating PCI resources starting at 40000000 (gap: 40000000:bec00000) Booting paravirtualized kernel on bare hardware setup_percpu: NR_CPUS:32 nr_cpumask_bits:32 nr_cpu_ids:1 nr_node_ids:1 PERCPU: Embedded 13 pages/cpu @df400000 s32704 r0 d20544 u2097152 Built 1 zonelists in Zone order, mobility grouping on. Total pages: 31743 Kernel command line: ro root=/dev/mapper/vg_dellgx24003-lv_root rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD KEYTABLE=us console=ttyS0,115200 rd_LVM_LV=vg_dellgx24003/lv_root rd_LVM_LV=vg_dellgx24003/lv_swap SYSFONT=latarcyrheb-sun16 rd_NO_DM irqpoll nr_cpus=1 reset_devices cgroup_disable=memory memmap=exactmap memmap=64K$0K memmap=576K at 64K memmap=64K$960K memmap=130472K at 393216K memmap=19K at 523689K memmap=4K at 524284K memmap=8K#1048028K memmap=540K$1048036K memmap=64K$4173824K memmap=64K$4175872K memmap=5120K$4189184K elfcorehdr=523688K Misrouted IRQ fixup and polling support enabled This may significantly impact system performance Disabling memory control group subsystem PID hash table entries: 512 (order: -1, 2048 bytes) Dentry cache hash table entries: 16384 (order: 4, 65536 bytes) Inode-cache hash table entries: 8192 (order: 3, 32768 bytes) Initializing CPU#0 Initializing HighMem for node 0 (00000000:00000000) Memory: 112876k/524288k available (4429k kernel code, 18192k reserved, 2305k data, 500k init, 0k highmem) virtual kernel memory layout: fixmap : 0xffa96000 - 0xfffff000 (5540 kB) pkmap : 0xff600000 - 0xff800000 (2048 kB) vmalloc : 0xe0800000 - 0xff5fe000 ( 493 MB) lowmem : 0xc0000000 - 0xe0000000 ( 512 MB) .init : 0xd8a94000 - 0xd8b11000 ( 500 kB) .data : 0xd8853712 - 0xd8a93d80 (2305 kB) .text : 0xd8400000 - 0xd8853712 (4429 kB) Checking if this processor honours the WP bit even in supervisor mode...Ok. Hierarchical RCU implementation. NR_IRQS:2304 nr_irqs:256 16 Spurious LAPIC timer interrupt on cpu 0 do_IRQ: 0.89 No irq handler for vector (irq -1) Console: colour VGA+ 80x25 console [ttyS0] enabled Fast TSC calibration using PIT Detected 1694.460 MHz processor. Calibrating delay loop (skipped), value calculated using timer frequency.. 3388.92 BogoMIPS (lpj=1694460) pid_max: default: 32768 minimum: 301 Security Framework initialized SELinux: Initializing. Mount-cache hash table entries: 512 Initializing cgroup subsys cpuacct Initializing cgroup subsys memory Initializing cgroup subsys devices Initializing cgroup subsys freezer Initializing cgroup subsys net_cls Initializing cgroup subsys blkio Initializing cgroup subsys perf_event CPU0: Hyper-Threading is disabled mce: CPU supports 4 MCE banks SMP alternatives: switching to UP code Freeing SMP alternatives: 20k freed ACPI: Core revision 20120111 Enabling APIC mode: Flat. Using 1 I/O APICs ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 CPU0: Intel(R) Pentium(R) 4 CPU 1.70GHz stepping 02 Performance Events: Netburst events, Broken PMU hardware detected, using software events only. NMI watchdog disabled (cpu0): hardware events not enabled Brought up 1 CPUs Total of 1 processors activated (3388.92 BogoMIPS). <snip>