I'm working on a standard x86-64 system with the kernel v5.10 configured with THP enabled (also CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD=y) and using 4-level paging. My system supports both 2M and 1G huge pages. I know that pmd_present() and pte_present() check for _PAGE_PRESENT|_PAGE_PROTNONE (pmd_present() also checks for _PAGE_PSE), because pages (4K or 2M) without permissions are marked as not present in the page table while still having _PAGE_PROTNONE set. However pud_present() seems to not care about this and only checks _PAGE_PRESENT. This seemed weird to me, so I wrote a small program that maps an anonymous page through mmap with MAP_HUGETLB|MAP_HUGETLB_1GB, writes to the entire mapping, then mprotects it to 0 (no permissions) and pauses at each step to allow for inspection. I have a kernel module [1] which walks the page table given a PID and virtual address. Using it to dump the pud_val() of the pud_t I see the following: *page is mapped RW* *page is written to* *insert module to check page table* pud_val(pud) = 80000006400008e7 (PRESENT USER ACCESSED PSE DIRTY SOFT_DIRTY NX) *page is mprotect'd to 0* *insert module to check page table* pud_val(pud) = 000ffff9bffff9e0 (ACCESSED PSE PAT DIRTY PROTNONE SOFT_DIRTY) Right off the bat, that 000ffff9bffff9e0 seems like a weird value to me: there are a lot of bits set, amd it seems like 000064 has been inverted into ffff9b (kind of, the LSB does not match). As I suspected, after the page is mprotect'd to 0 from userspace, pud_present(pud) returns false. However /proc/[pid]/pagemap still reports the page as present (bit 63 set), and the reported page frame number matches the one extracted from the page table by my module (which is 0x640000, before the mprotect changes the pud to that weird value). If in my module I re-define pud_present(pud) to check for _PAGE_PRESENT|_PAGE_PROTNONE, now I get a true result. Furthermore (still after mmap + write + mprotect 0), pud_huge() returns true (I suppose pud_huge() should identify a MAP_HUGETLB 1G page so it makes sense), but pud_large() returns false. So my questions are: 1. What's the deal with the weird PUD value after mprotect 0? 2. Why doesn't pud_present() work the same way as pte_present() or pmd_present() do? 3. What's the correct way to check if a pud_t is present or not, including when it is PROTNONE (i.e. corresponds to a 1G huge page with no protections)? 4. What's the correct way to check if a pud_t is a leaf i.e. it corresponds to a huge 1G page (transparent or not)? 5. Why does pud_large() return false? Isn't it supposed to be more "generic" than pud_huge() returning true for 1G transparent huge pages too? I must be missing or misunderstanding something. Is anyone able to clarify the above? [1] https://github.com/mebeim/linux-kernel-experiments/blob/2019ec856befc9a070d8422921e96aa09de9bff6/modules/page_table_walk.c -- Thanks, Marco Bonelli _______________________________________________ Kernelnewbies mailing list Kernelnewbies@xxxxxxxxxxxxxxxxx https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies