On Fri, 6 Sep 2019 11:58:59 +0530 Anshuman Khandual <anshuman.khandual@xxxxxxx> wrote: > On 09/05/2019 10:36 PM, Gerald Schaefer wrote: > > On Thu, 5 Sep 2019 14:48:14 +0530 > > Anshuman Khandual <anshuman.khandual@xxxxxxx> wrote: > > > >>> [...] > >>>> + > >>>> +#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK) > >>>> +static void pud_clear_tests(pud_t *pudp) > >>>> +{ > >>>> + memset(pudp, RANDOM_NZVALUE, sizeof(pud_t)); > >>>> + pud_clear(pudp); > >>>> + WARN_ON(!pud_none(READ_ONCE(*pudp))); > >>>> +} > >>> > >>> For pgd/p4d/pud_clear(), we only clear if the page table level is present > >>> and not folded. The memset() here overwrites the table type bits, so > >>> pud_clear() will not clear anything on s390 and the pud_none() check will > >>> fail. > >>> Would it be possible to OR a (larger) random value into the table, so that > >>> the lower 12 bits would be preserved? > >> > >> So the suggestion is instead of doing memset() on entry with RANDOM_NZVALUE, > >> it should OR a large random value preserving lower 12 bits. Hmm, this should > >> still do the trick for other platforms, they just need non zero value. So on > >> s390, the lower 12 bits on the page table entry already has valid value while > >> entering this function which would make sure that pud_clear() really does > >> clear the entry ? > > > > Yes, in theory the table entry on s390 would have the type set in the last > > 4 bits, so preserving those would be enough. If it does not conflict with > > others, I would still suggest preserving all 12 bits since those would contain > > arch-specific flags in general, just to be sure. For s390, the pte/pmd tests > > would also work with the memset, but for consistency I think the same logic > > should be used in all pxd_clear_tests. > > Makes sense but.. > > There is a small challenge with this. Modifying individual bits on a given > page table entry from generic code like this test case is bit tricky. That > is because there are not enough helpers to create entries with an absolute > value. This would have been easier if all the platforms provided functions > like __pxx() which is not the case now. Otherwise something like this should > have worked. > > > pud_t pud = READ_ONCE(*pudp); > pud = __pud(pud_val(pud) | RANDOM_VALUE (keeping lower 12 bits 0)) > WRITE_ONCE(*pudp, pud); > > But __pud() will fail to build in many platforms. Hmm, I simply used this on my system to make pud_clear_tests() work, not sure if it works on all archs: pud_val(*pudp) |= RANDOM_NZVALUE; > > The other alternative will be to make sure memset() happens on all other > bits except the lower 12 bits which will depend on endianness. If s390 > has a fixed endianness, we can still use either of them which will hold > good for others as well. > > memset(pudp, RANDOM_NZVALUE, sizeof(pud_t) - 3); > > OR > > memset(pudp + 3, RANDOM_NZVALUE, sizeof(pud_t) - 3); > > > > > However, there is another issue on s390 which will make this only work > > for pud_clear_tests(), and not for the p4d/pgd_tests. The problem is that > > mm_alloc() will only give you a 3-level page table initially on s390. > > This means that pudp == p4dp == pgdp, and so the p4d/pgd_tests will > > both see the pud level (of course this also affects other tests). > > Got it. > > > > > Not sure yet how to fix this, i.e. how to initialize/update the page table > > to 5 levels. We can handle 5 level page tables, and it would be good if > > all levels could be tested, but using mm_alloc() to establish the page > > tables might not work on s390. One option could be to provide an arch-hook > > or weak function to allocate/initialize the mm. > > Sure, got it. Though I plan to do add some arch specific tests or init sequence > like the above later on but for now the idea is to get the smallest possible set > of test cases which builds and runs on all platforms without requiring any arch > specific hooks or special casing (#ifdef) to be agreed upon broadly and accepted. > > Do you think this is absolutely necessary on s390 for the very first set of test > cases or we can add this later on as an improvement ? It can be added later, no problem. I did not expect this to work flawlessly on s390 right from the start anyway, with all our peculiarities, so don't let this hinder you. I might come up with an add-on patch later. Actually, using get_unmapped_area() as suggested by Kirill could also solve this issue. We do create a new mm with 3-level page tables on s390, and the dynamic upgrade to 4 or 5 levels is then triggered exactly by arch_get_unmapped_area(), depending on the addr. But I currently don't see how / where arch_get_unmapped_area() is set up for such a dummy mm created by mm_alloc(). Regards, Gerald