On Mon, 2023-06-12 at 10:58 +0300, kirill.shutemov@xxxxxxxxxxxxxxx wrote: > On Mon, Jun 12, 2023 at 03:06:48AM +0000, Huang, Kai wrote: > > On Fri, 2023-06-09 at 16:23 +0300, kirill.shutemov@xxxxxxxxxxxxxxx wrote: > > > On Mon, Jun 05, 2023 at 02:27:31AM +1200, Kai Huang wrote: > > > > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c > > > > index 8ff07256a515..0aa413b712e8 100644 > > > > --- a/arch/x86/virt/vmx/tdx/tdx.c > > > > +++ b/arch/x86/virt/vmx/tdx/tdx.c > > > > @@ -587,6 +587,14 @@ static int tdmr_set_up_pamt(struct tdmr_info *tdmr, > > > > tdmr_pamt_base += pamt_size[pgsz]; > > > > } > > > > > > > > + /* > > > > + * tdx_memory_shutdown() also reads TDMR's PAMT during > > > > + * kexec() or reboot, which could happen at anytime, even > > > > + * during this particular code. Make sure pamt_4k_base > > > > + * is firstly set otherwise tdx_memory_shutdown() may > > > > + * get an invalid PAMT base when it sees a valid number > > > > + * of PAMT pages. > > > > + */ > > > > > > Hmm? What prevents compiler from messing this up. It can reorder as it > > > wishes, no? > > > > Hmm.. Right. Sorry I missed. > > > > > > > > Maybe add a proper locking? Anything that prevent preemption would do, > > > right? > > > > > > > tdmr->pamt_4k_base = pamt_base[TDX_PS_4K]; > > > > tdmr->pamt_4k_size = pamt_size[TDX_PS_4K]; > > > > tdmr->pamt_2m_base = pamt_base[TDX_PS_2M]; > > > > > > > I think a simple memory barrier will do. How does below look? > > > > --- a/arch/x86/virt/vmx/tdx/tdx.c > > +++ b/arch/x86/virt/vmx/tdx/tdx.c > > @@ -591,11 +591,12 @@ static int tdmr_set_up_pamt(struct tdmr_info *tdmr, > > * tdx_memory_shutdown() also reads TDMR's PAMT during > > * kexec() or reboot, which could happen at anytime, even > > * during this particular code. Make sure pamt_4k_base > > - * is firstly set otherwise tdx_memory_shutdown() may > > - * get an invalid PAMT base when it sees a valid number > > - * of PAMT pages. > > + * is firstly set and place a __mb() after it otherwise > > + * tdx_memory_shutdown() may get an invalid PAMT base > > + * when it sees a valid number of PAMT pages. > > */ > > tdmr->pamt_4k_base = pamt_base[TDX_PS_4K]; > > + __mb(); > > If you want to play with barriers, assign pamt_4k_base the last with > smp_store_release() and read it first in tdmr_get_pamt() with > smp_load_acquire(). If it is non-zero, all pamt_* fields are valid. > > Or just drop this non-sense and use a spin lock for serialization. > We don't need to guarantee when pamt_4k_base is valid, all other pamt_* are valid. Instead, we need to guarantee when (at least) _one_ of pamt_*_size is valid, the pamt_4k_base is valid. For example, pamt_4k_base -> valid pamt_4k_size -> invalid (0) pamt_2m_size -> invalid pamt_1g_size -> invalid and pamt_4k_base -> valid pamt_4k_size -> valid pamt_2m_size -> invalid pamt_1g_size -> invalid are both OK. The reason is the PAMTs are only written by the TDX module in init_tdmrs(). So if tdx_memory_shutdown() sees a part of PAMT (the second case above), those PAMT pages are not yet TDX private pages, thus converting part of PAMT is fine. The invalid case is when any pamt_*_size is valid, pamt_4k_base is invalid, e.g.: pamt_4k_base -> invalid pamt_4k_size -> valid pamt_2m_size -> invalid pamt_1g_size -> invalid as this case tdx_memory_shutdown() will convert a incorrect (not partial) PAMT area. So I think a __mb() after setting tdmr->pamt_4k_base should be good enough, as it guarantees when setting to any pamt_*_size happens, the valid pamt_4k_base will be seen by other cpus. Does it make sense?