On 04/20/2018 12:02 PM, Jan Beulich wrote: >>>> On 20.04.18 at 17:52, <jandryuk@xxxxxxxxx> wrote: >> On Fri, Apr 20, 2018 at 11:42 AM, Jan Beulich <JBeulich@xxxxxxxx> wrote: >>>>>> On 20.04.18 at 17:25, <andrew.cooper3@xxxxxxxxxx> wrote: >>>> On 20/04/18 16:20, Jason Andryuk wrote: >>>>> Adding xen-devel and the Linux Xen maintainers. >>>>> >>>>> Summary: Some Xen users (and maybe others) are hitting a BUG in >>>>> __radix_tree_lookup() under do_swap_page() - example backtrace is >>>>> provided at the end. Matthew Wilcox provided a band-aid patch that >>>>> prints errors like the following instead of triggering the bug. >>>>> >>>>> Skylake 32bit PAE Dom0: >>>>> Bad swp_entry: 80000000 >>>>> mm/swap_state.c:683: bad pte d3a39f1c(8000000400000000) >>>>> >>>>> Ivy Bridge 32bit PAE Dom0: >>>>> Bad swp_entry: 40000000 >>>>> mm/swap_state.c:683: bad pte d3a05f1c(8000000200000000) >>>>> >>>>> Other 32bit DomU: >>>>> Bad swp_entry: 4000000 >>>>> mm/swap_state.c:683: bad pte e2187f30(8000000200000000) >>>>> >>>>> Other 32bit: >>>>> Bad swp_entry: 2000000 >>>>> mm/swap_state.c:683: bad pte ef3a3f38(8000000100000000) >>>>> >>>>> The Linux bugzilla has more info >>>>> https://bugzilla.kernel.org/show_bug.cgi?id=198497 >>>>> >>>>> This may not be exclusive to Xen Linux, but most of the reports are on >>>>> Xen. Matthew wonders if Xen might be stepping on the upper bits of a >>>>> pte. >>>> Yes - Xen does use the upper bits of a PTE, but only 1 in release >>>> builds, and a second in debug builds. I don't understand where you're >>>> getting the 3rd bit in there. >>> The former supposedly is _PAGE_GUEST_KERNEL, which we use for 64-bit >>> guests only. Above talk is of 32-bit guests only. >>> >>> In addition both this and _PAGE_GNTTAB are used on present PTEs only, >>> while above talk is about swap entries. >> This hits a BUG going through do_swap_page, but it seems like users >> don't think they are actually using swap at the time. One reporter >> didn't have any swap configured. Some of this information was further >> down in my original message. >> >> I'm wondering if somehow we have a PTE that should be empty and should >> be lazily filled. For some reason, the entry has some bits set and is >> causing the trouble. Would Xen mess with the PTEs in that case? > As said in my previous reply - both of the bits Andrew has mentioned can > only ever be set when the present bit is also set (which doesn't appear to > be the case here). The set bits above are actually in the range of bits > designated to the address, which Xen wouldn't ever play with. The bug description starts with: "On a Xen VM running as pvh" So is this a PV or a PVH guest? -boris