Re: [PATCH v3 7/8] x86/virt/tdx: Reduce TDMR's reserved areas by using CMRs to find memory holes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 30/08/24 14:52, Huang, Kai wrote:
> On Fri, 2024-08-30 at 13:50 +0300, Hunter, Adrian wrote:
>> On 27/08/24 10:14, Kai Huang wrote:
>>> A TDX module initialization failure was reported on a Emerald Rapids
>>> platform:
>>>
>>>   virt/tdx: initialization failed: TDMR [0x0, 0x80000000): reserved areas exhausted.
>>>   virt/tdx: module initialization failed (-28)
>>>
>>> As part of initializing the TDX module, the kernel informs the TDX
>>> module of all "TDX-usable memory regions" using an array of TDX defined
>>> structure "TD Memory Region" (TDMR).  Each TDMR must be in 1GB aligned
>>> and in 1GB granularity, and all "non-TDX-usable memory holes" within a
>>> given TDMR must be marked as "reserved areas".  The TDX module reports a
>>> maximum number of reserved areas that can be supported per TDMR.
>>
>> The statement:
>>
>>       ... all "non-TDX-usable memory holes" within a
>>       given TDMR must be marked as "reserved areas".
>>
>> is not exactly true, which is essentially the basis of this fix.
> 
> Hmm I think I see what you mean.  Perhaps the "must be marked as" confuses
> you?
> 
> The "TDX-usable memory" here means all pages that can potentially be used by
> TDX.  They don't have to be actually used by TDX, i.e., "TDX-usable memory" vs
> "TDX-used memory".
> 
> And the "non-TDX-usable memory holes" means the memory regions that cannot be
> possibly used by TDX at all.
> 
> Is below better if I change "must be marked as" to "are"?
> 
>   As part of initializing the TDX module, the kernel informs the TDX
>   module of all "TDX-usable memory regions" using an array of TDX defined
>   structure "TD Memory Region" (TDMR).  Each TDMR must be in 1GB aligned
>   and in 1GB granularity, and all "non-TDX-usable memory holes" within a
>   given TDMR are marked as "reserved areas".  The TDX module reports a
>   maximum number of reserved areas that can be supported per TDMR.

Yes, that's better.

> 
> Note in my logic here we don't need to mention CMR.  Here I just want to tell
> the fact that each TDMR has number of "reserved areas" and the maximum number
> is reported by TDX module.
> 
>>
>> The relevant requirements are (from the spec):
>>
>>   Any non-reserved 4KB page within a TDMR must be convertible
>>   i.e., it must be within a CMR
> 
> Yes.
> 
>>
>>   Reserved areas within a TDMR need not be within a CMR.
> 
> Yes.  They need not to be, but they can be.
> 
>>
>>   PAMT areas must not overlap with TDMR non-reserved areas;
>>   however, they may reside within TDMR reserved areas
>>   (as long as these are convertible).
> 
> Yes.  However in implementation PAMTs are out of page allocator so they are
> all within TDMRs thus need to be put to reserved areas.
> 
> Those are TDX architectural requirements.  They are not all related to the fix
> of this problem.  The most important thing here is:
> 
>   Any non-reserved memory within a TDMR must be within CMR.
> 
> That means as long as one memory region is CMR, it doesn't need to be in
> "reserved area" from TDX architecture's prespective.
> 
> That means we can include more memory regions (even they cannot be used by TDX
> at all) as "non-reserved" areas in TDMRs to reduce the number of "reserved
> areas" as long as those regions are within CMR.
> 
> This is the logic behind this fix.
> 
>>
>>>
>>> Currently, the kernel finds those "non-TDX-usable memory holes" within a
>>> given TDMR by walking over a list of "TDX-usable memory regions", which
>>> essentially reflects the "usable" regions in the e820 table (w/o memory
>>> hotplug operations precisely, but this is not relevant here).
>>
>> But including e820 table regions that are not "usable" in the TDMR
>> reserved areas is not necessary - it is not one of the rules.
> 
> True.  That's why we can do this fix.
> 
>>
>> What confused me initially was that I did not realize the we already
>> require that the TDX Module does not touch memory in the TDMR
>> non-reserved areas not specifically allocated to it.  So it makes no
>> difference to the TDX Module what the pages that have not been allocated
>> to it, are used for.
>>
>>>
>>> As shown above, the root cause of this failure is when the kernel tries
>>> to construct a TDMR to cover address range [0x0, 0x80000000), there
>>> are too many memory holes within that range and the number of memory
>>> holes exceeds the maximum number of reserved areas.
>>>
>>> The E820 table of that platform (see [1] below) reflects this: the
>>> number of memory holes among e820 "usable" entries exceeds 16, which is
>>> the maximum number of reserved areas TDX module supports in practice.
>>>
>>> === Fix ===
>>>
>>> There are two options to fix this: 1) reduce the number of memory holes
>>> when constructing a TDMR to save "reserved areas"; 2) reduce the TDMR's
>>> size to cover fewer memory regions, thus fewer memory holes.
>>
>> Probably better to try and get rid of this "two options" stuff and focus
>> on how this is a simple and effective fix.
> 
> As I mentioned in another reply I would prefer to keep those options since I
> believe they can provide a full view to the reviewers.





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux