On Mon, May 17, 2021 at 07:01:09PM -0700, Steve Rutherford wrote: > On Fri, May 14, 2021 at 3:05 AM Ashish Kalra <ashish.kalra@xxxxxxx> wrote: > > > > On Fri, May 14, 2021 at 11:34:32AM +0200, Borislav Petkov wrote: > > > On Fri, May 14, 2021 at 09:05:23AM +0000, Ashish Kalra wrote: > > > > Ideally we should fail/stop migration even if a single guest page > > > > encryption status cannot be notified and that should be the way to > > > > proceed in this case, the guest kernel should notify the source > > > > userspace VMM to block/stop migration in this case. > > > > > > Yap, and what I'm trying to point out here is that if the low level > > > machinery fails for whatever reason and it cannot recover, we should > > > propagate that error up the chain so that the user is aware as to why it > > > failed. > > > > > > > I totally agree. > > > > > WARN is a good first start but in some configurations those splats don't > > > even get shown as people don't look at dmesg, etc. > > > > > > And I think it is very important to propagate those errors properly > > > because there's a *lot* of moving parts involved in a guest migration > > > and you have encrypted memory which makes debugging this probably even > > > harder, etc, etc. > > > > > > I hope this makes more sense. > > > > > > > From a practical side, i do see Qemu's migrate_add_blocker() interface > > > > but that looks to be a static interface and also i don't think it will > > > > force stop an ongoing migration, is there an existing mechanism > > > > to inform userspace VMM from kernel about blocking/stopping migration ? > > > > > > Hmm, so __set_memory_enc_dec() which calls > > > notify_addr_enc_status_changed() is called by the guest, right, when it > > > starts migrating. > > > > > > > No, actually notify_addr_enc_status_changed() is called whenever a range > > of memory is marked as encrypted or decrypted, so it has nothing to do > > with migration as such. > > > > This is basically modifying the encryption attributes on the page tables > > and correspondingly also making the hypercall to inform the hypervisor about > > page status encryption changes. The hypervisor will use this information > > during an ongoing or future migration, so this information is maintained > > even though migration might never be initiated here. > > > > > Can an error value from it be propagated up the callchain so it can be > > > turned into an error messsage for the guest owner to see? > > > > > > > The error value cannot be propogated up the callchain directly > > here, but one possibility is to leverage the hypercall and use Sean's > > proposed hypercall interface to notify the host/hypervisor to block/stop > > any future/ongoing migration. > > > > Or as from Paolo's response, writing 0 to MIGRATION_CONTROL MSR seems > > more ideal. > > > > Thanks, > > Ashish > > How realistic is this type of failure? If you've gotten this deep, it > seems like something has gone very wrong if the memory you are about > to mark as shared (or encrypted) doesn't exist and isn't mapped. In > particular, is the kernel going to page fault when it tries to > reinitialize the page it's currently changing the c-bit of? From what > I recall, most paths that do either set_memory_encrypted or > set_memory_decrypted memset the region being toggled. Note: dma_pool > doesn't immediately memset, but the VA it's providing to set_decrypted > is freshly fetched from a recently allocated region (something has to > have gone pretty wrong if this is invalid if I'm not mistaken. No one > would think twice if you wrote to that freshly allocated page). > > The reason I mention this is that SEV migration is going to be the > least of your concerns if you are already on a one-way train towards a > Kernel oops. I'm not certain I would go so far as to say this should > BUG() instead (I think the page fault on access might be easier to > debug a BUG here), but I'm pretty skeptical that the kernel is going > to do too well if it doesn't know if its kernel VAs are valid. > > If, despite the above, we expect to infrequently-but-not-never disable > migration with no intention of reenabling it, we should signal it > differently than we currently signal migration enablement. Currently, > if you toggle migration from on to off there is an implication that > you are about to reboot, and you are only ephemerally unable to > migrate. Having permanent disablement be indistinguishable from a > really long reboot is a recipe for a really sad long timeout in > userspace. Also looking at set_memory_encrypted and set_memory_decrypted usage patterns, the persistent decrypted regions like the dma pool, bss_decrypted, etc., will be setup at boot time. Later the unused bss_decrypted section will be freed and set_memory_encrypted called on the freed memory at end of kernel boot. Most of the runtime set_memory_encrypted and set_memory_decrypted calls will be on dynamically allocated dma buffers via dma_alloc_coherent() and dma_free_coherent(). For example, A dma_alloc_coherent() request will allocate pages and then call set_memory_decrypted() against them. When dma_free_coherent() is called, set_memory_encrypted() is called against the pages about to be freed before they are actually freed. Now these buffers have very short life and only used for immediate I/O and then freed, so they may not be of major concern for SEV migration ? So disabling migration for failure of address lookup or mapping failures on such pages will really be an overkill. Might be in favor of Steve's thoughts above of doing a BUG() here instead. Thanks, Ashish