On Thu, Jul 21, 2022 at 08:49:31AM -0700, Dave Hansen wrote: > Acceptance is slow and the heavy lifting is done inside the TDX module. > It involves flushing old aliases out of the caches and initializing the > memory integrity metadata for every cacheline. This implementation does > acceptance in 2MB chunks while holding a global lock. Oh, fun. > So, those (effective) 2MB clflush+memset's (plus a few thousand cycles > for the hypercall/tdcall transitions) So this sounds strange - page validation on AMD - judging by the pseudocode of the PVALIDATE insn - does a bunch of sanity checks on the gVA of the page and then installs it into the RMP and also "PVALIDATE performs the same segmentation and paging checks as a 1-byte read. PVALIDATE does not invalidate TLB caches." But that still sounds a lot less work than what the TDX module needs to do... > can't happen in parallel. They are serialized and must wait on each > other. Ofc, the Intel version of the RMP table needs to be protected. :-) > If you have a few hundred CPUs all trying to allocate memory (say, > doing the first kernel compile after a reboot), this is going to be > very, very painful for a while. > > That said, I think this is the right place to _start_. There is going > to need to be some kind of follow-on solution (likely background > acceptance of some kind). But, even with that solution, *this* code > is still needed to handle the degenerate case where the background > accepter can't keep up with foreground memory needs. I'm still catering to the view that it should be a two-tier thing: you validate during boot a certain amount - say 4G - a size for which the boot delay is acceptable and you do the rest on-demand along with a background accepter. That should give you the best of both worlds... Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette