Hi, Jiri 在 2021年01月22日 01:00, crash-utility-request@xxxxxxxxxx 写道: > Date: Thu, 21 Jan 2021 17:18:29 +0100 > From: Jiri Bohac <jbohac@xxxxxxx> > To: crash-utility@xxxxxxxxxx > Cc: anderson@xxxxxxxxxx > Subject: increase __PHYSICAL_MASK_SHIFT_XEN? > Message-ID: <20210121161829.vizm27sb6pfrgmrl@xxxxxxxxxxxxx> > Content-Type: text/plain; charset=us-ascii > > Hi, > > I've just come across a situation where crash failed to open a > dump generated by a 4.12 XEN PV dom0 kernel, terminating with > this message: > > crash: read error: physical address: ffffffffffffffff type: "pud page" > > The problem is a failed machine-to-physical translation. > xen_m2p() returns an error (-1UL) and x86_64_pud_offset() than > uses that value as a physical address. > > I debugged the problem by running crash inside gdb. The backtrace was: > #0 xen_m2p (machine=973135872) at kernel.c:9714 > #1 0x000000000053ed4a in x86_64_pud_offset (pgd_pte=3299508019303, vaddr=18446744072642223552, verbose=0, IS_XEN=1) > at x86_64.c:1889 > #2 0x0000000000540c8a in x86_64_kvtop_xen_wpt (tc=0x0, kvaddr=18446744072642223552, paddr=0x7fffffffdb30, verbose=0) > at x86_64.c:2523 > #3 0x00000000005407d0 in x86_64_kvtop (tc=0x0, kvaddr=18446744072642223552, paddr=0x7fffffffdb30, verbose=0) > at x86_64.c:2413 > #4 0x0000000000491a97 in kvtop (tc=0x0, kvaddr=18446744072642223552, paddr=0x7fffffffdb30, verbose=0) at memory.c:3062 > #5 0x000000000048f3f0 in readmem (addr=18446744072642223552, memtype=1, buffer=0xba63a0 <shared_bufs>, size=832, > type=0x92e1f2 "module struct", error_handle=6) at memory.c:2314 > #6 0x00000000005071e2 in module_init () at kernel.c:3699 > .... > > I tracked the problem to a wrong value of > __PHYSICAL_MASK_SHIFT_XEN. The current value of 40 does not > correspond to the current kernel value of 52 since kernel commit > 6f0e8bf16730a36ff6773802d8c8df56d10e60cd (xen: support 52 bit > physical addresses in pv guests). > > The result is visible in the above backtrace: > x86_64_pud_offset() is called with pgd_pte=0x3003a00e067 and that > value is wrongly masked by "pud_paddr = pgd_pte & > PHYSICAL_PAGE_MASK" and passed to xen_m2p() as 0x3a00e000 instead > of 0x3003a00e000, causing the m2p translation to fail. > Thank you for sharing the details. > Setting __PHYSICAL_MASK_SHIFT_XEN to 52 fixes the problem with > this dump. > > But I am not confident it's a safe change. My understanding is > that it should be safe, as all the unused bits of the physical > address inside the PTEs should be 0 and thus having the mask wider > than necessary should not hurt. But I am suspicious if my > reasoning is correct. Why does crash go into such trouble > differentiating between different kernels and sets > machdep->machspec->physical_mask_shift dynamically to one of > __PHYSICAL_MASK_SHIFT_XEN (40), __PHYSICAL_MASK_SHIFT_2_6 (46), The reason is that the supported kernel virtual address allows to be configured on other architectures, and that still relies on the hardware capability. > and __PHYSICAL_MASK_SHIFT_5LEVEL (52)? Would something break if > it were always set to 52? The commit adding the logic is > 307e7f35. > It's time to update the physical mask for xen, but can you help to test the address translation via some command such as vtop, ptov, etc? Or follow up the 5-level page changes to check if it has the capability(cr4 & CR4_LA57)? Thanks. Lianbo > Thanks, > > -- Jiri Bohac <jbohac@xxxxxxx> SUSE Labs, Prague, Czechia -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility