On 04/07/2020 09:24 PM, Gerald Schaefer wrote: > On Sun, 5 Apr 2020 17:58:14 +0530 > Anshuman Khandual <anshuman.khandual@xxxxxxx> wrote: > > [...] >>> >>> Could be fixed like this (the first de-reference is a bit special, >>> because at that point *ptep does not really point to a large (pmd) entry >>> yet, it is initially an invalid pte entry, which breaks our huge_ptep_get() >> >> There seems to be an inconsistency on s390 platform. Even though it defines >> a huge_ptep_get() override, it does not subscribe __HAVE_ARCH_HUGE_PTEP_GET >> which should have forced it fallback on generic huge_ptep_get() but it does >> not :) Then I realized that __HAVE_ARCH_HUGE_PTEP_GET only makes sense when >> an arch uses <asm-generic/hugetlb.h>. s390 does not use that and hence gets >> away with it's own huge_ptep_get() without __HAVE_ARCH_HUGE_PTEP_GET. Sounds >> confusing ? But I might not have the entire context here. > > Yes, that sounds very confusing. Also a bit ironic, since huge_ptep_get() > was initially introduced because of s390, and now we don't select > __HAVE_ARCH_HUGE_PTEP_GET... > > As you realized, I guess this is because we do not use generic hugetlb.h. > And when __HAVE_ARCH_HUGE_PTEP_GET was introduced with commit 544db7597ad > ("hugetlb: introduce generic version of huge_ptep_get"), that was probably > the reason why we did not get our share of __HAVE_ARCH_HUGE_PTEP_GET. Understood. > > Nothing really wrong with that, but yes, very confusing. Maybe we could > also select it for s390, even though it wouldn't have any functional > impact (so far), just for less confusion. Maybe also thinking about > using the generic hugetlb.h, not sure if the original reasons for not > doing so would still apply. Now I only need to find the time... Seems like something worth to explore if we could remove this confusion. > >> >>> conversion logic. I also added PMD_MASK alignment for RANDOM_ORVALUE, >>> because we do have some special bits there in our large pmds. It seems >>> to also work w/o that alignment, but it feels a bit wrong): >> >> Sure, we can accommodate that. >> >>> >>> @@ -731,26 +731,26 @@ static void __init hugetlb_advanced_test >>> unsigned long vaddr, pgprot_t prot) >>> { >>> struct page *page = pfn_to_page(pfn); >>> - pte_t pte = READ_ONCE(*ptep); >>> + pte_t pte; >>> >>> - pte = __pte(pte_val(pte) | RANDOM_ORVALUE); >>> + pte = pte_mkhuge(mk_pte_phys(RANDOM_ORVALUE & PMD_MASK, prot)); >> >> So that keeps the existing value in 'ptep' pointer at bay and instead >> construct a PTE from scratch. I would rather have READ_ONCE(*ptep) at >> least provide the seed that can be ORed with RANDOM_ORVALUE before >> being masked with PMD_MASK. Do you see any problem ? > > Yes, unfortunately. The problem is that the resulting pte is not marked > as present. The conversion pte -> (huge) pmd, which is done in > set_huge_pte_at() for s390, will establish an empty pmd for non-present > ptes, all the RANDOM_ORVALUE stuff is lost. And a subsequent > huge_ptep_get() will not result in the same original pte value. If you Ohh. > want to preserve and check the RANDOM_ORVALUE, it has to be a present > pte, hence the mk_pte(_phys). Understood and mk_pte() is also available on all platforms. > >> >> Some thing like this instead. >> >> pte_t pte = READ_ONCE(*ptep); >> pte = pte_mkhuge(__pte((pte_val(pte) | RANDOM_ORVALUE) & PMD_MASK)); >> >> We cannot use mk_pte_phys() as it is defined only on some platforms >> without any generic fallback for others. > > Oh, didn't know that, sorry. What about using mk_pte() instead, at least > it would result in a present pte: > > pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), prot)); Lets use mk_pte() here but can we do this instead paddr = (__pfn_to_phys(pfn) | RANDOM_ORVALUE) & PMD_MASK; pte = pte_mkhuge(mk_pte(phys_to_page(paddr), prot)); > > And if you also want to do some with the existing value, which seems > to be an empty pte, then maybe just check if writing and reading that > value with set_huge_pte_at() / huge_ptep_get() returns the same, > i.e. initially w/o RANDOM_ORVALUE. > > So, in combination, like this (BTW, why is the barrier() needed, it > is not used for the other set_huge_pte_at() calls later?): Ahh missed, will add them. Earlier we faced problem without it after set_pte_at() for a test on powerpc (64) platform. Hence just added it here to be extra careful. > > @@ -733,24 +733,28 @@ static void __init hugetlb_advanced_test > struct page *page = pfn_to_page(pfn); > pte_t pte = READ_ONCE(*ptep); > > - pte = __pte(pte_val(pte) | RANDOM_ORVALUE); > + set_huge_pte_at(mm, vaddr, ptep, pte); > + WARN_ON(!pte_same(pte, huge_ptep_get(ptep))); > + > + pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), prot)); > set_huge_pte_at(mm, vaddr, ptep, pte); > barrier(); > WARN_ON(!pte_same(pte, huge_ptep_get(ptep))); > > This would actually add a new test "write empty pte with > set_huge_pte_at(), then verify with huge_ptep_get()", which happens > to trigger a warning on s390 :-) On arm64 as well which checks for pte_present() in set_huge_pte_at(). But PTE present check is not really present in each set_huge_pte_at() implementation especially without __HAVE_ARCH_HUGE_SET_HUGE_PTE_AT. Hence wondering if we should add this new test here which will keep giving warnings on s390 and arm64 (at the least). > > That (new) warning actually points to misbehavior on s390, we do not > write a correct empty pmd in this case, but one that is empty and also > marked as large. huge_ptep_get() will then not correctly recognize it > as empty and do wrong conversion. It is also not consistent with > huge_ptep_get_and_clear(), where we write the empty pmd w/o marking > as large. Last but not least it would also break our pmd_protnone() > logic (see below). Another nice finding on s390 :-) :) > > I don't think this has any effect in practice (yet), but I will post a > fix for that, just in case you will add / change this test. Okay. > >> >>> set_huge_pte_at(mm, vaddr, ptep, pte); >>> barrier(); >>> WARN_ON(!pte_same(pte, huge_ptep_get(ptep))); >>> huge_pte_clear(mm, vaddr, ptep, PMD_SIZE); >>> - pte = READ_ONCE(*ptep); >>> + pte = huge_ptep_get(ptep); >>> WARN_ON(!huge_pte_none(pte)); >>> >>> pte = mk_huge_pte(page, prot); >>> set_huge_pte_at(mm, vaddr, ptep, pte); >>> huge_ptep_set_wrprotect(mm, vaddr, ptep); >>> - pte = READ_ONCE(*ptep); >>> + pte = huge_ptep_get(ptep); >>> WARN_ON(huge_pte_write(pte)); >>> >>> pte = mk_huge_pte(page, prot); >>> set_huge_pte_at(mm, vaddr, ptep, pte); >>> huge_ptep_get_and_clear(mm, vaddr, ptep); >>> - pte = READ_ONCE(*ptep); >>> + pte = huge_ptep_get(ptep); >>> WARN_ON(!huge_pte_none(pte)); >>> >>> pte = mk_huge_pte(page, prot); >>> @@ -759,7 +759,7 @@ static void __init hugetlb_advanced_test >>> pte = huge_pte_mkwrite(pte); >>> pte = huge_pte_mkdirty(pte); >>> huge_ptep_set_access_flags(vma, vaddr, ptep, pte, 1); >>> - pte = READ_ONCE(*ptep); >>> + pte = huge_ptep_get(ptep); >>> WARN_ON(!(huge_pte_write(pte) && huge_pte_dirty(pte))); >>> } >>> #else >>> >>> 3) The pmd_protnone_tests() has an issue, because it passes a pmd to >>> pmd_protnone() which has not been marked as large. We check for large >>> pmd in the s390 implementation of pmd_protnone(), and will fail if a >>> pmd is not large. We had similar issues before, in other helpers, where >>> I changed the logic on s390 to not require the pmd large check, but I'm >>> not so sure in this case. Is there a valid use case for doing >>> pmd_protnone() on "normal" pmds? Or could this be changed like this: >> >> That is a valid question. IIUC, all existing callers for pmd_protnone() >> ensure that it is indeed a huge PMD. But even assuming otherwise should >> not the huge PMD requirement get checked in the caller itself rather than >> in the arch helper which is just supposed to check the existence of the >> dedicated PTE bit(s) for this purpose. Purely from a helper perspective >> pmd_protnone() should not really care about being large even though it >> might never get used without one. >> >> Also all platforms (except s390) derive the pmd_protnone() from their >> respective pte_protnone(). I wonder why should s390 be any different >> unless it is absolutely necessary. > > This is again because of our different page table entry layouts for > pte/pmd and (large) pmd. The bits we check for pmd_protnone() are > not valid for normal pmd/pte, and we would return undefined result for > normal entries. > > Of course, we could rely on nobody calling pmd_protnone() on normal > pmds, but in this case we also use pmd_large() check in pmd_protnone() > for indication if the pmd is present. W/o that, we would return > true for empty pmds, that doesn't sound right. Not sure if we also > want to rely on nobody calling pmd_protnone() on empty pmds. That might be problematic. > > Anyway, if in practice it is not correct to use pmd_protnone() > on normal pmds, then I would suggest that your tests should also > not do / test it. And I strongly assume that it is not correct, at > least I cannot think of a valid case, and of course s390 would > already be broken if there was such a case. Okay, will use huge PMD here as you had suggested earlier. > > Regards, > Gerald > >