On 10/24/24 at 08:15am, Yan Zhao wrote: > On Wed, Oct 23, 2024 at 10:44:11AM -0500, Eric W. Biederman wrote: > > "Kirill A. Shutemov" <kirill@xxxxxxxxxxxxx> writes: > > > > > Waiting minutes to get VM booted to shell is not feasible for most > > > deployments. Lazy is sane default to me. > > > > Huh? > > > > Unless my guesses about what is happening are wrong lazy is hiding > > a serious implementation deficiency. From all hardware I have seen > > taking minutes is absolutely ridiculous. > > > > Does writing to all of memory at full speed take minutes? How can such > > a system be functional? > > > > If you don't actually have to write to the pages and it is just some > > accounting function it is even more ridiculous. > > > > > > I had previously thought that accept_memory was the firmware call. > > Now that I see that it is just a wrapper for some hardware specific > > calls I am even more perplexed. > > > > > > Quite honestly what this looks like to me is that someone failed to > > enable write-combining or write-back caching when writing to memory > > when initializing the protected memory. With the result that everything > > is moving dog slow, and people are introducing complexity left and write > > to avoid that bad implementation. > > > > > > Can someone please explain to me why this accept_memory stuff has to be > > slow, why it has to take minutes to do it's job. > This kexec patch is a fix to a guest(TD)'s kexce failure. > > For a linux guest, the accept_memory() happens before the guest accesses a page. > It will (if the guest is a TD) > (1) trigger the host to allocate the physical page on host to map the accessed > guest page, which might be slow with wait and sleep involved, depending on > the memory pressure on host. > (2) initializing the protected page. > > Actually most of guest memory are not accessed by guest during the guest life > cycle. accept_memory() may cause the host to commit a never-to-be-used page, > with the host physical page not even being able to get swapped out. So this sounds to me more like a business requirement on cloud platform, e.g if one customer books a guest instance with 60G memory, while the customer actually always only cost 20G memory at most. Then the 40G memory can be saved to reduce pressure for host. I could be shallow, just a wild guess. If my guess is right, at least those cloud service providers must like this accept_memory feature very much. > > That's why we need a lazy accept, which does not accept_memory() until after a > page is allocated by the kernel (in alloc_page(s)). By the way, I have two questions, maybe very shallow. 1) why can't we only find those already accepted memory to put kexec kernel/initrd/bootparam/purgatory? 2) why can't we accept memory for (kernel, boot params/cmdline/initrd) in 2nd kernel? Surely this purgatory still need be accepted in 1st kernel. Sorry, I just read accept_memory() code, haven't gone through x86 boot code flow. Thanks Baoquan