On Thu, Nov 29, 2018 at 01:35:17PM +0000, Juergen Gross wrote: > On 29/11/2018 14:26, Kirill A. Shutemov wrote: > > On Thu, Nov 29, 2018 at 09:41:25AM +0000, Juergen Gross wrote: > >> On 29/11/2018 02:22, Hans van Kranenburg wrote: > >>> Hi, > >>> > >>> As also seen at: > >>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914951 > >>> > >>> Attached there are two serial console output logs. One is starting with > >>> Xen 4.11 (from debian unstable) as dom0, and the other one without Xen. > >>> > >>> [ 2.085543] BUG: unable to handle kernel paging request at > >>> ffff888d9fffc000 > >>> [ 2.085610] PGD 200c067 P4D 200c067 PUD 0 > >>> [ 2.085674] Oops: 0000 [#1] SMP NOPTI > >>> [ 2.085736] CPU: 1 PID: 1 Comm: swapper/0 Not tainted > >>> 4.19.0-trunk-amd64 #1 Debian 4.19.5-1~exp1+pvh1 > >>> [ 2.085823] Hardware name: HP ProLiant DL360 G7, BIOS P68 05/21/2018 > >>> [ 2.085895] RIP: e030:ptdump_walk_pgd_level_core+0x1fd/0x490 > >>> [...] > >> > >> The offending stable commit is 4074ca7d8a1832921c865d250bbd08f3441b3657 > >> ("x86/mm: Move LDT remap out of KASLR region on 5-level paging"), this > >> is commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream. > >> > >> Current upstream kernel is booting fine under Xen, so in general the > >> patch should be fine. Using an upstream kernel built from above commit > >> (with the then needed Xen fixup patch 1457d8cf7664f34c4ba534) is fine, > >> too. > >> > >> Kirill, are you aware of any prerequisite patch from 4.20 which could be > >> missing in 4.19.5? > > > > I'm not. > > > > Let me look into this. > > > > What is making me suspicious is the failure happening just after > releasing the init memory. Maybe there is an access to .init.data > segment or similar? The native kernel booting could be related to the > usage of 2M mappings not being available in a PV-domain. Sounds like a valid hypothesis. [ 2.085616] Code: 00 00 00 00 40 00 00 49 83 c5 08 48 01 04 24 4c 3b 6c 24 48 0f 84 83 02 00 00 48 8b 04 24 48 c1 f8 10 48 89 84 24 88 00 00 00 <49> 8b 7d 00 48 f7 c7 9f ff ff ff 0f 85 36 ff ff ff 41 b8 03 00 00 All code ======== 0: 00 00 add %al,(%rax) 2: 00 00 add %al,(%rax) 4: 40 00 00 add %al,(%rax) 7: 49 83 c5 08 add $0x8,%r13 b: 48 01 04 24 add %rax,(%rsp) f: 4c 3b 6c 24 48 cmp 0x48(%rsp),%r13 14: 0f 84 83 02 00 00 je 0x29d 1a: 48 8b 04 24 mov (%rsp),%rax 1e: 48 c1 f8 10 sar $0x10,%rax 22: 48 89 84 24 88 00 00 mov %rax,0x88(%rsp) 29: 00 2a:* 49 8b 7d 00 mov 0x0(%r13),%rdi <-- trapping instruction 2e: 48 f7 c7 9f ff ff ff test $0xffffffffffffff9f,%rdi 35: 0f 85 36 ff ff ff jne 0xffffffffffffff71 3b: 41 rex.B 3c: b8 .byte 0xb8 3d: 03 00 add (%rax),%eax ... Code starting with the faulting instruction =========================================== 0: 49 8b 7d 00 mov 0x0(%r13),%rdi 4: 48 f7 c7 9f ff ff ff test $0xffffffffffffff9f,%rdi b: 0f 85 36 ff ff ff jne 0xffffffffffffff47 11: 41 rex.B 12: b8 .byte 0xb8 13: 03 00 add (%rax),%eax ... Reading from %r13 causes the fault. I don't have a setup to reproduce the issue myself and have hard time correlate the code with source. What is ptdump_walk_pgd_level_core+0x1fd/0x490 for you? -- Kirill A. Shutemov