On Sep 27, 2022, at 9:29 AM, Chih-En Lin <shiyn.lin@xxxxxxxxx> wrote: > This patch adds the Copy-On-Write (COW) mechanism to the PTE table. > To enable the COW page table use the sysctl vm.cow_pte file with the > corresponding PID. It will set the MMF_COW_PTE_READY flag to the > process for enabling COW PTE during the next time of fork. > > It uses the MMF_COW_PTE flag to distinguish the normal page table > and the COW one. Moreover, it is difficult to distinguish whether the > entire page table is out of COW state. So the MMF_COW_PTE flag won't be > disabled after its setup. > > Since the memory space of the page table is distinctive for each process > in kernel space. It uses the address of the PMD index for the PTE table > ownership to identify which one of the processes needs to update the > page table state. In other words, only the owner will update shared > (COWed) PTE table state, like the RSS and pgtable_bytes. > > Some PTE tables (e.g., pinned pages that reside in the table) still need > to be copied immediately for consistency with the current COW logic. As > a result, a flag, COW_PTE_OWNER_EXCLUSIVE, indicating whether a PTE > table is exclusive (i.e., only one task owns it at a time) is added to > the table’s owner pointer. Every time a PTE table is copied during the > fork, the owner pointer (and thus the exclusive flag) will be checked to > determine whether the PTE table can be shared across processes. > > It uses a reference count to track the lifetime of COWed PTE table. > Doing the fork with COW PTE will increase the refcount. And, when > someone writes to the COWed PTE table, it will cause the write fault to > break COW PTE. If the COWed PTE table's refcount is one, the process > that triggers the fault will reuse the COWed PTE table. Otherwise, the > process will decrease the refcount, copy the information to a new PTE > table or dereference all the information and change the owner if they > have the COWed PTE table. > > If doing the COW to the PTE table once as the time touching the PMD > entry, it cannot preserves the reference count of the COWed PTE table. > Since the address range of VMA may overlap the PTE table, the copying > function will use VMA to travel the page table for copying it. So it may > increase the reference count of the COWed PTE table multiple times in > one COW page table forking. Generically it will only increase once time > as the child reference it. To solve this problem, it needs to check the > destination of PMD entry does exist. And the reference count of the > source PTE table is more than one before doing the COW. > > This patch modifies the part of the copy page table to do the basic COW. > For the break COW, it modifies the part of a page fault, zaps page table > , unmapping, and remapping. I only skimmed the patches that you sent. The last couple of patches seem a bit rough and dirty, so I am sorry to say that I skipped them (too many “TODO” and “XXX” for my taste). I am sure other will have better feedback than me. I understand there is a tradeoff and that this mechanism is mostly for high performance snapshotting/forking. It would be beneficial to see whether this mechanism can somehow be combined with existing ones (mshare?). The code itself can be improved. I found the reasoning about synchronization and TLB flushes and synchronizations to be lacking, and the code to seem potentially incorrect. Better comments would help, even if the code is correct. There are additional general questions. For instance, when sharing a page-table, do you properly update the refcount/mapcount of the mapped pages? And are there any possible interactions with THP? Thanks, Nadav