On Fri, 4 Sep 2020 18:01:15 +0200 Gerald Schaefer <gerald.schaefer@xxxxxxxxxxxxx> wrote: [...] > > BTW2, a quick test with this change (so far) made the issues on s390 > go away: > > @@ -1069,7 +1074,7 @@ static int __init debug_vm_pgtable(void) > spin_unlock(ptl); > > #ifndef CONFIG_PPC_BOOK3S_64 > - hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot); > + hugetlb_advanced_tests(mm, vma, (pte_t *) pmdp, pmd_aligned, vaddr, prot); > #endif > > spin_lock(&mm->page_table_lock); > > That would more match the "pte_t pointer" usage for hugetlb code, > i.e. just cast a pmd_t pointer to it. Also changed to pmd_aligned, > but I think the root cause is the pte_t pointer. > > Not entirely sure though if that would really be the correct fix. > I somehow lost whatever little track I had about what these tests > really want to check, and if that would still be valid with that > change. Uh oh, wasn't aware that this (or some predecessor) already went upstream, and broke our debug kernel today. I found out now what goes (horribly) wrong on s390, see below for more details. In short, using hugetlb primitives with ptep pointers that do _not_ point to a pmd or pud entry will not work on s390. It also seems to make no sense to verify / test such a thing in general, as it would also be a severe bug if any kernel code would do that. After all, with hugepages, there are no pte tables, only pmd etc. tables. My change above would fix the issue for s390, but I can still not completely judge if that would not break other things for your tests. In general, for normal kernel code, much of what you do would be very broken, but I guess your tests are doing such "special" things because they can. E.g. because they operate on some "sandbox" mm and page tables, and you also do not need properly populated page tables for some exit / free cleanup, you just throw them away explicitly with pXd_free at the end. So it might just be "the right thing" to pass a casted pmd pointer to hugetlb_advanced_tests(), to simulate and test (proper) usage of the hugetlb primitives. I also see no other way to make this work for s390, than using a proper pmd/pud pointer. If not possible, please add us to the #ifndef. So, for all those interested, here is what goes wrong on s390. huge_ptep_get_and_clear() uses the "idte" instruction for the clearing (and TLB invalidation) part. That instruction expects a "region or segment table" origin, which is a pmd/pud/p4d/pgd, but not a pte table. Even worse, when we calculate the table origin from the given ptep (which *should* not point to a pte), due to different table sizes for pte / pXd tables, we end up at some place before the given pte table. The "idte" instruction also gets the virtual address, and does corresponding index addition to the given table origin. Depending on the pmd_index we now end up either within the pte table again, in which case we see a panic because idte complains about seeing a pte value. If we are unlucky, then we end up outside the pte table, and depending on the content of that memory location, idte might succeed, effectively corrupting that memory. That explains why we only see the panic sometimes, depending on random vaddr, other symptoms other times, and probably completely silent memory corruption for the rest...