Re: AMD erratum 665 on f15h processor?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



When you git reply, please hit reply-to-all in your mail client so that
mailing lists get CCed too.

On Mon, Dec 18, 2017 at 07:54:52PM +0300, Andrew Randrianasulu wrote:
> В сообщении от Monday 18 December 2017 16:22:15 вы написали:
> > + kvm ML.
> >
> > On Mon, Dec 18, 2017 at 06:01:21AM +0300, Andrew Randrianasulu wrote:
> > > В сообщении от Sunday 17 December 2017 23:52:05 вы написали:
> > > > On Sun, Dec 17, 2017 at 12:04:28PM +0300, Andrew Randrianasulu wrote:
> > > > > Hello!
> > > > >
> > > > > I was trying to investigate why all my old kernels can't be booted on
> > > > > my relatively new machine. Kernels 4.10+ naturally boot - I use
> > > > > 4.14.3 right now - but old kernels die early ...
> > > > >
> > > > > After some digging I found this
> > > > > https://patchwork.kernel.org/patch/9311567/
> > > > >
> > > > > Patch talk about family 12h, but my machine has this CPU:
> > > > >
> > > > > [    0.056000] smpboot: CPU0: AMD FX(tm)-4300 Quad-Core Processor
> > > > > (family: 0x15, model: 0x2, stepping: 0x0)
> > > > > [    0.056000] Performance Events: Fam15h core perfctr, AMD PMU
> > > > > driver.
> > > >
> > > > Yes, your machine is not affected by that erratum. So far so good.
> > > >
> > > > The rest of your mail I have hard time understanding: you're talking
> > > > about old kernels not booting on a new machine but then you paste a
> > > > qemu 32-bit guest kernel boot log and after that I'm lost.
> > > >
> > > > Perhaps you should try again by explaining in detail what exactly
> > > > you're trying to do and how exactly you're going about doing that...
> > >
> > > Hi, Borislav!
> > >
> > > I was trying to boot few self-made liveCD/DVDs - they use self-compiled
> > > kernels in 3.2-4.2 range. None of those old disks boots in qemu if I set
> > > it to cpu type 'host'. I have whole collection of old kernels since 2011,
> > > and none work anymore ! Even older CD with 2.6.23.something plainly
> > > rebooted after kernel and initrd were loaded by isolinux on physical
> > > machine! But 2.6.27.9 worked at least in qemu (not really want to reboot
> > > machine due to some stuff in tmpfs). So, because 4.2.0-i486  was my
> > > previous failsafe kernel, and it most likely will not work anymore - I
> > > guess I will use 4.12.0-x64.. I was just trying to find any change
> > > explaining this error, and your fix was closer I was able to find in this
> > > time interval (2015-2017). May be it was just some unrelated purely
> > > software bug in amd detection code.. I spend some time trying to figure
> > > out how to copy/paste from qemu, finally -curses interface worked.
> > >
> > > I think I missed this misbehavior because I mostly used just qemu,
> > > without -cpu host (but with -enable-kvm), so it worked without problems.
> >
> > So -cpu host means:
> >
> > x86             host  KVM processor with all supported host features (only
> > available in KVM mode)
> >
> > which would theoretically mean that those guest kernel configs shouldn't
> > boot on the baremetal box either, if they fail on the guest.
> >
> > But who knows what's happening.
> >
> > You can give me a guest kernel .config of a kernel which fails along
> > with the exact qemu cmdline to try out here.
> 
> .config attached.
> 
> for reproducting just launch qemu like this:
> 
> qemu-system-i386 -kernel /home/admin/slax-build/boot/vmlinuz -cpu 
> host --enable-kvm (just tried).
> 
>  Of course replace path to kernel image with your own. I can also attach binary 
> image, but I think it will be of little use for you.....

Nah, I built it using your .config.

So my guest stops very early in the BIOS with 

"Failed to allocate space for phdrs

-- System halted."

Then I looked at this:

https://bugzilla.kernel.org/show_bug.cgi?id=114671

and there's a patch

https://bugzilla.kernel.org/attachment.cgi?id=209601&action=diff&collapsed=&headers=1&format=raw

With it, it booted a bit further. But I still couldn't see any output.

So I booted with my cmdline to see more output and it did say:

general protection fault: 0000 [#1] SMP 
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.2.0-i486+ #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
task: c05b9a80 ti: c05b2000 task.ti: c05b2000
EIP: 0060:[<c010e390>] EFLAGS: 00210293 CPU: 0
EIP is at cpu_has_amd_erratum+0x24/0xb0
EAX: 00210bf7 EBX: 00000001 ECX: c0010140 EDX: c044ccf4
ESI: c0616900 EDI: c044ccf8 EBP: c05b3f68 ESP: c05b3f58
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
CR0: 8005003b CR2: ffc77000 CR3: 006ae000 CR4: 00040690
Stack:
 02008140 00000000 c0616900 00000000 c05b3fa8 c010ec8b f5001d80 0000001e
 00000000 00000000 00000009 00000010 00000000 c0616900 00000000 c05b3fa8
 c010cf58 c0616900 c0616900 c061695c c05b3fc8 c010d156 c061698b c061695c
Call Trace:
 [<c010ec8b>] init_amd+0x5ee/0x631
 [<c010cf58>] ? get_cpu_cap+0x121/0x126
 [<c010d156>] identify_cpu+0x1f9/0x37d
 [<c0624a18>] identify_boot_cpu+0xd/0x80
 [<c0624abd>] check_bugs+0x8/0x35
 [<c061ea42>] start_kernel+0x32a/0x339
 [<c061e2c2>] i386_start_kernel+0x8c/0x90
Code: cf 5b c0 89 e5 5d c3 55 89 e5 57 56 53 51 89 c6 8b 1a 8d 7a 04 81 fb ff ff 00 00 77 57 8b 40 2c 0f ba e0 09 73 4e b9 40 01 01 c0 <0f> 32 89 45 f0 89 d8 89 d1 99 39 ca 77 3b 72 05 3b 5d f0 73 34
EIP: [<c010e390>] cpu_has_amd_erratum+0x24/0xb0 SS:ESP 0068:c05b3f58
---[ end trace 7fb9e71b486a229a ]---
Kernel panic - not syncing: Attempted to kill the idle task!
---[ end Kernel panic - not syncing: Attempted to kill the idle task!

Which is exactly like the splat you've posted and that fails:

Code: cf 5b c0 89 e5 5d c3 55 89 e5 57 56 53 51 89 c6 8b 1a 8d 7a 04 81 fb ff ff 00 00 77 57 8b 40 2c 0f ba e0 09 73 4e b9 40 01 01 c0 <0f> 32 89 45 f0 89 d8 89 d1 99 39 ca 77 3b 72 05 3b 5d f0 73 34
All code
========
   0:   cf                      iret   
   1:   5b                      pop    %rbx
   2:   c0 89 e5 5d c3 55 89    rorb   $0x89,0x55c35de5(%rcx)
   9:   e5 57                   in     $0x57,%eax
   b:   56                      push   %rsi
   c:   53                      push   %rbx
   d:   51                      push   %rcx
   e:   89 c6                   mov    %eax,%esi
  10:   8b 1a                   mov    (%rdx),%ebx
  12:   8d 7a 04                lea    0x4(%rdx),%edi
  15:   81 fb ff ff 00 00       cmp    $0xffff,%ebx
  1b:   77 57                   ja     0x74
  1d:   8b 40 2c                mov    0x2c(%rax),%eax
  20:   0f ba e0 09             bt     $0x9,%eax
  24:   73 4e                   jae    0x74
  26:   b9 40 01 01 c0          mov    $0xc0010140,%ecx
  2b:*  0f 32                   rdmsr           <-- trapping instruction
  2d:   89 45 f0                mov    %eax,-0x10(%rbp)
  30:   89 d8                   mov    %ebx,%eax
  32:   89 d1                   mov    %edx,%ecx
  34:   99                      cltd
  35:   39 ca                   cmp    %ecx,%edx
  37:   77 3b                   ja     0x74
  39:   72 05                   jb     0x40
  3b:   3b 5d f0                cmp    -0x10(%rbp),%ebx
  3e:   73 34                   jae    0x74

because it tries to read from a non-existent MSR - 0xc0010140 - and
maybe it is because of the -cpu host emulation or so but those MSRs do
get virtualized, see

2b036c6b861d ("KVM: SVM: Add support for AMD's OSVW feature in guests")

but I'd refer to the kvm/qemu people to explain what the deal here
exactly is.

What I do, is use -cpu Opteron_G5 which is also F15h and that works.
Oh, and I'd use 64-bit kernels - 32-bit is not really being tested as
extensively.

HTH.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux