On Wed, 2022-11-23 at 14:17 -0800, Dave Hansen wrote: > On 11/20/22 16:26, Kai Huang wrote: > > TDX provides increased levels of memory confidentiality and integrity. > > This requires special hardware support for features like memory > > encryption and storage of memory integrity checksums. Not all memory > > satisfies these requirements. > > > > As a result, the TDX introduced the concept of a "Convertible Memory > > s/the TDX introduced/TDX introduces/ > > > Region" (CMR). During boot, the firmware builds a list of all of the > > memory ranges which can provide the TDX security guarantees. The list > > of these ranges is available to the kernel by querying the TDX module. > > > > The TDX architecture needs additional metadata to record things like > > which TD guest "owns" a given page of memory. This metadata essentially > > serves as the 'struct page' for the TDX module. The space for this > > metadata is not reserved by the hardware up front and must be allocated > > by the kernel and given to the TDX module. > > > > Since this metadata consumes space, the VMM can choose whether or not to > > allocate it for a given area of convertible memory. If it chooses not > > to, the memory cannot receive TDX protections and can not be used by TDX > > guests as private memory. > > > > For every memory region that the VMM wants to use as TDX memory, it sets > > up a "TD Memory Region" (TDMR). Each TDMR represents a physically > > contiguous convertible range and must also have its own physically > > contiguous metadata table, referred to as a Physical Address Metadata > > Table (PAMT), to track status for each page in the TDMR range. > > > > Unlike a CMR, each TDMR requires 1G granularity and alignment. To > > support physical RAM areas that don't meet those strict requirements, > > each TDMR permits a number of internal "reserved areas" which can be > > placed over memory holes. If PAMT metadata is placed within a TDMR it > > must be covered by one of these reserved areas. > > > > Let's summarize the concepts: > > > > CMR - Firmware-enumerated physical ranges that support TDX. CMRs are > > 4K aligned. > > TDMR - Physical address range which is chosen by the kernel to support > > TDX. 1G granularity and alignment required. Each TDMR has > > reserved areas where TDX memory holes and overlapping PAMTs can > > be put into. > > s/put into/represented/ > > > PAMT - Physically contiguous TDX metadata. One table for each page size > > per TDMR. Roughly 1/256th of TDMR in size. 256G TDMR = ~1G > > PAMT. > > > > As one step of initializing the TDX module, the kernel configures > > TDX-usable memory regions by passing an array of TDMRs to the TDX module. > > > > Constructing the array of TDMRs consists below steps: > > > > 1) Create TDMRs to cover all memory regions that the TDX module can use; > > Slight tweak: > > 1) Create TDMRs to cover all memory regions that the TDX module will use > for TD memory > > The TDX module "uses" more memory than strictly the TMDR's. > > > 2) Allocate and set up PAMT for each TDMR; > > 3) Set up reserved areas for each TDMR. > > s/Set up/Designate/ Thanks. All above will be addressed. > > > Add a placeholder to construct TDMRs to do the above steps after all > > TDX memory regions are verified to be truly convertible. Always free > > TDMRs at the end of the initialization (no matter successful or not) > > as TDMRs are only used during the initialization. > > The changelog here actually looks really good to me so far. > > > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c > > index 32af86e31c47..26048c6b0170 100644 > > --- a/arch/x86/virt/vmx/tdx/tdx.c > > +++ b/arch/x86/virt/vmx/tdx/tdx.c > > @@ -445,6 +445,63 @@ static int build_tdx_memory(void) > > return ret; > > } > > > > +/* Calculate the actual TDMR_INFO size */ > > +static inline int cal_tdmr_size(void) > > I think we can spare the bytes to add "culate" in the function name so > we don't think these are California TDMRs. Sure will do. > > > +{ > > + int tdmr_sz; > > + > > + /* > > + * The actual size of TDMR_INFO depends on the maximum number > > + * of reserved areas. > > + * > > + * Note: for TDX1.0 the max_reserved_per_tdmr is 16, and > > + * TDMR_INFO size is aligned up to 512-byte. Even it is > > + * extended in the future, it would be insane if TDMR_INFO > > + * becomes larger than 4K. The tdmr_sz here should never > > + * overflow. > > + */ > > + tdmr_sz = sizeof(struct tdmr_info); > > + tdmr_sz += sizeof(struct tdmr_reserved_area) * > > + tdx_sysinfo.max_reserved_per_tdmr; > > First, I think 'tdx_sysinfo' should probably be a local variable in > init_tdx_module() and have its address passed in here. Having global > variables always makes it more opaque about who is initializing it. > > Second, if this code is making assumptions about > 'max_reserved_per_tdmr', then let's actually add assertions or sanity > checks. For instance: > > if (tdx_sysinfo.max_reserved_per_tdmr > MAX_TDMRS) > return -1; > > or even: > > if (tdmr_sz > PAGE_SIZE) > return -1; I can add this. > > It does almost no good to just assert what the limits are in a comment. > > > + /* > > + * TDX requires each TDMR_INFO to be 512-byte aligned. Always > > + * round up TDMR_INFO size to the 512-byte boundary. > > + */ > > <sigh> More silly comments. > > The place to document this is TDMR_INFO_ALIGNMENT. If anyone wants to > know what the alignment is, exactly, they can look at the definition. > They don't need to be told *TWICE* what TDMR_INFO_ALIGNMENT #defines to > in one comment. I see. Then I think we don't even need this comment since the name of TDMR_INFO_ALIGNMENT already implies? > > > + return ALIGN(tdmr_sz, TDMR_INFO_ALIGNMENT); > > +} > > + > > +static struct tdmr_info *alloc_tdmr_array(int *array_sz) > > +{ > > + /* > > + * TDX requires each TDMR_INFO to be 512-byte aligned. > > + * Use alloc_pages_exact() to allocate all TDMRs at once. > > + * Each TDMR_INFO will still be 512-byte aligned since > > + * cal_tdmr_size() always returns 512-byte aligned size. > > + */ > > OK, I think you're just trolling me now. Two *MORE* mentions of the > 512-byte alignment? I'll remove. > > > + *array_sz = cal_tdmr_size() * tdx_sysinfo.max_tdmrs; > > + > > + /* > > + * Zero the buffer so 'struct tdmr_info::size' can be > > + * used to determine whether a TDMR is valid. > > + * > > + * Note: for TDX1.0 the max_tdmrs is 64 and TDMR_INFO size > > + * is 512-byte. Even they are extended in the future, it > > + * would be insane if the total size exceeds 4MB. > > + */ > > + return alloc_pages_exact(*array_sz, GFP_KERNEL | __GFP_ZERO); > > +} > > This looks massively over complicated. > > Get rid of this function entirely. Then create: > > static int tdmr_array_size(void) > { > return tdmr_size_single() * tdx_sysinfo.max_tdmrs; > } > > The *caller* can do: > > tdmr_array = alloc_pages_exact(tdmr_array_size(), > GFP_KERNEL | __GFP_ZERO); > if (!tdmr_array) { > ... > > Then the error path is: > > free_pages_exact(tdmr_array, tdmr_array_size()); > > Then, there are no size pointers going back and forth. Easy peasy. I'm > OK with a little arithmetic being repeated. Yes. Will do. > > > +/* > > + * Construct an array of TDMRs to cover all TDX memory ranges. > > + * The actual number of TDMRs is kept to @tdmr_num. > > + */ > > +static int construct_tdmrs(struct tdmr_info *tdmr_array, int *tdmr_num) > > +{ > > + /* Return -EINVAL until constructing TDMRs is done */ > > + return -EINVAL; > > +} > > + > > /* > > * Detect and initialize the TDX module. > > * > > @@ -454,6 +511,9 @@ static int build_tdx_memory(void) > > */ > > static int init_tdx_module(void) > > { > > + struct tdmr_info *tdmr_array; > > + int tdmr_array_sz; > > + int tdmr_num; > > I tend to write these like: > > "tdmr_num" is the number of *a* TDMR. > > "nr_tdmrs" is the number of TDMRs. Indeed. Will do.