Sorry for sending twice. The mail app decided to use HTML for some reason. On Aug 23, 2021, at 10:32 PM, Nadav Amit <nadav.amit@xxxxxxxxx> wrote: > > On Aug 23, 2021, at 6:25 AM, Mike Rapoport <rppt@xxxxxxxxxx> wrote: > > From: Mike Rapoport <rppt@xxxxxxxxxxxxx> > > Allocate page table using __GFP_PTE_MAPPED so that they will have 4K PTEs > in the direct map. This allows to switch _PAGE_RW bit each time a page > table page needs to be made writable or read-only. > > The writability of the page tables is toggled only in the lowest level page > table modifiction functions and immediately switched off. > > The page tables created early in the boot (including the direct map page > table) are not write protected. > > [ snip ] > +static void pgtable_write_set(void *pg_table, bool set) > +{ > + int level = 0; > + pte_t *pte; > + > + /* > + * Skip the page tables allocated from pgt_buf break area and from > + * memblock > + */ > + if (!after_bootmem) > + return; > + if (!PageTable(virt_to_page(pg_table))) > + return; > + > + pte = lookup_address((unsigned long)pg_table, &level); > + if (!pte || level != PG_LEVEL_4K) > + return; > + > + if (set) { > + if (pte_write(*pte)) > + return; > + > + WRITE_ONCE(*pte, pte_mkwrite(*pte)); I think that the pte_write() test (and the following one) might hide latent bugs. Either you know whether the PTE is write-protected or you need to protect against nested/concurrent calls to pgtable_write_set() by disabling preemption/IRQs. Otherwise, you risk in having someone else write-protecting the PTE after it is write-unprotected and before it is written - causing a crash, or write-unprotecting it after it is protected - which circumvents the protection. Therefore, I would think that instead you should have: VM_BUG_ON(pte_write(*pte)); // (or WARN_ON_ONCE()) In addition, if there are assumptions on the preemptability of the code, it would be nice to have some assertions. I think that the code assumes that all calls to pgtable_write_set() are done while holding the page-table lock. If that is the case, perhaps adding some lockdep assertion would also help to confirm the correctness. [ I put aside the lack of TLB flushes, which make the whole matter of delivered protection questionable. I presume that once PKS is used, this is not an issue. ]