On Wed, Dec 06, 2023 at 01:28:08AM +0000, Edgecombe, Rick P wrote: > On Tue, 2023-12-05 at 03:45 +0300, Kirill A. Shutemov wrote: > > +static void tdx_kexec_unshare_mem(bool crash) > > +{ > > + unsigned long addr, end; > > + long found = 0, shared; > > + > > + /* Stop new private<->shared conversions */ > > + conversion_allowed = false; > > I wonder if this might need a compiler barrier here to be totally safe. > I'm not sure. Yeah, it should be cleaner with a barrier. > > + > > + /* > > + * Crash kernel reaches here with interrupts disabled: can't > > wait for > > + * conversions to finish. > > + * > > + * If race happened, just report and proceed. > > + */ > > + if (!crash) { > > + unsigned long timeout; > > + > > + /* > > + * Wait for in-flight conversions to complete. > > + * > > + * Do not wait more than 30 seconds. > > + */ > > + timeout = 30 * USEC_PER_SEC; > > + while (atomic_read(&conversions_in_progress) && > > timeout--) > > + udelay(1); > > + } > > + > > + if (atomic_read(&conversions_in_progress)) > > + pr_warn("Failed to finish shared<->private > > conversions\n"); > > I can't think of any non-ridiculous way to handle this case. Maybe we > need VMM help. Do you see a specific way how VMM can help here? > > diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c > > index 830425e6d38e..c81afffaa954 100644 > > --- a/arch/x86/kernel/reboot.c > > +++ b/arch/x86/kernel/reboot.c > > @@ -12,6 +12,7 @@ > > #include <linux/delay.h> > > #include <linux/objtool.h> > > #include <linux/pgtable.h> > > +#include <linux/kexec.h> > > #include <acpi/reboot.h> > > #include <asm/io.h> > > #include <asm/apic.h> > > @@ -31,6 +32,7 @@ > > #include <asm/realmode.h> > > #include <asm/x86_init.h> > > #include <asm/efi.h> > > +#include <asm/tdx.h> > > > > /* > > * Power off function, if any > > @@ -716,6 +718,14 @@ static void > > native_machine_emergency_restart(void) > > > > void native_machine_shutdown(void) > > { > > + /* > > + * Call enc_kexec_unshare_mem() while all CPUs are still > > active and > > + * interrupts are enabled. This will allow all in-flight > > memory > > + * conversions to finish cleanly before unsharing all memory. > > + */ > > + if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT) && > > kexec_in_progress) > > + x86_platform.guest.enc_kexec_unshare_mem(false); > > These questions are coming from an incomplete understanding of the > kexec/reboot operation. Please disregard if it is not helpful. > > By doing this while other tasks can still run, it handles the > conversion races in the !crash case. But then it sets shared pages to > NP. What happens if another active task tries to write to one? > > I guess we rely on the kernel_restart_prepare()->device_shutdown() to > clean up, which runs before native_machine_shutdown(). So there might > be conversions in progress when tdx_kexec_unshare_mem() is called, from > the allocator work queues. But the actual memory won't be accessed > during that operation. Right, devices has to be shutdown by then. > But the console must be active? Or otherwise who can see these > warnings. It doesn't use a shared page? Or the KVM clock, which looks > to clean up at cpu tear down, which now happens after > tdx_kexec_unshare_mem()? So I wonder if there might be cases. Virtio console is not functional by then, but serial is. Serial uses port I/O and doesn't need shared memory. > If so, maybe you could halt the conversions in > native_machine_shutdown(), then do the actual reset to private after > tasks can't schedule. It would also mean that we cannot use set_memory_np() there as it requires sleepable context. I would rather keep conversion in native_machine_shutdown() path. > I'd still wonder about if anything might try to > access a shared page triggered by the console output. set_memory_np() would make it obvious if it ever happens. -- Kiryl Shutsemau / Kirill A. Shutemov _______________________________________________ kexec mailing list kexec@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/kexec