Re: [RFC PATCH 09/12] khugepaged: Introduce vma_collapse_anon_folio()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




When having to back-off (restore original PTEs), or for copying,
you'll likely need access to the original PTEs, which were already
cleared. So likely you need a temporary copy of the original PTEs
somehow.

That's why temporarily clearing the PMD und mmap write lock is easier
to implement, at the cost of requiring the mmap lock in write mode
like PMD collapse.

Why do I need to clear the PMD if I am taking the mmap_write_lock() and
operating only on the PTE?

One approach I proposed to Nico (and I think he has a prototype) is:

a) Take all locks like we do today (mmap in write, vma in write, rmap in write)

After this step, no "ordinary" page table walkers can run anymore

b) Clear the PMD entry and flush the TLB like we do today

After this step, neither the CPU can read/write folios nor GUP-fast can run. The PTE table is completely isolated.

c) Now we can work on the (temporarily cleared) PTE table as we please: isolate folios, lock them, ... without clearing the PTE entries, just like we do today.

d) Allocate the new folios (we don't have to hold any spinlocks), copy + replace the affected PTE entries in the isolated PTE table. Similar to what we do today, except that we don't clear PTEs but instead clear+reset.

e) Unlock+un-isolate + unref the collapsed folios like we do today.

f) Re-map the PTE-table, like we do today when collapse would have failed.


Of course, after taking all locks we have to re-verify that there is something to collapse (e.g., in d) we also have to check for unexpected folio references). The backup path is easy: remap the PTE table as no PTE entries were touched just yet.

Observe that many things are "like we do today".


As soon as we go to read locks + PTE locks, it all gets more complicated to get it right. Not that it cannot be done, but the above is IMHO a lot simpler to get right.

--
Cheers,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux