On Wed, 24 Mar 2021 11:09:20 +0100 Paolo Bonzini wrote: > On 24/03/21 10:38, Kai Huang wrote: > > Hi Sean, Boris, Paolo, > > > > Thanks for the discussion. I tried to digest all your conversations and > > hopefully I have understood you correctly. I pasted the new patch here > > (not full patch, but relevant part only). I modified the error msg, added > > some writeup to Documentation/x86/sgx.rst, and put Sean's explanation of this > > bug to the commit msg (per Paolo). I am terrible Documentation writer, so > > please help to check and give comments. Thanks! > > I have some phrasing suggestions below but that was actually pretty good. > > > --- > > commit 1e297a535bcb4f51a08343c40207520017d85efe (HEAD) > > Author: Kai Huang <kai.huang@xxxxxxxxx> > > Date: Wed Jan 20 03:40:53 2021 +0200 > > > > x86/sgx: Wipe out EREMOVE from sgx_free_epc_page() > > > > EREMOVE takes a page and removes any association between that page and > > an enclave. It must be run on a page before it can be added into > > another enclave. Currently, EREMOVE is run as part of pages being freed > > into the SGX page allocator. It is not expected to fail. > > > > KVM does not track how guest pages are used, which means that SGX > > virtualization use of EREMOVE might fail. Specifically, it is > > legitimate that EREMOVE returns SGX_CHILD_PRESENT for EPC assigned to > > KVM guest, because KVM/kernel doesn't track SECS pages. > > > > Break out the EREMOVE call from the SGX page allocator. This will allow > > the SGX virtualization code to use the allocator directly. (SGX/KVM > > will also introduce a more permissive EREMOVE helper). > > Ok, I think I got the source of my confusion. The part in parentheses > is the key. It was not clear that KVM can deal with EREMOVE failures > *without printing the error*. Good! Yes the "will also introduce a more premissive EREMOVE helper" is done in patch 5 (x86/sgx: Introduce virtual EPC for use by KVM guests). > > > Implement original sgx_free_epc_page() as sgx_encl_free_epc_page() to be > > more specific that it is used to free EPC page assigned host enclave. > > Replace sgx_free_epc_page() with sgx_encl_free_epc_page() in all call > > sites so there's no functional change. > > > > Improve error message when EREMOVE fails, and kernel is about to leak > > EPC page, which is likely a kernel bug. This is effectively a kernel > > use-after-free of EPC, and due to the way SGX works, the bug is detected > > at freeing. Rather than add the page back to the pool of available EPC, > > the kernel intentionally leaks the page to avoid additional errors in > > the future. > > > > Also add documentation to explain to user what is the bug and suggest > > user what to do when this bug happens, although extremely unlikely. > > Rewritten: > > EREMOVE takes a page and removes any association between that page and > an enclave. It must be run on a page before it can be added into > another enclave. Currently, EREMOVE is run as part of pages being freed > into the SGX page allocator. It is not expected to fail, as it would > indicate a use-after-free of EPC. Rather than add the page back to the > pool of available EPC, the kernel intentionally leaks the page to avoid > additional errors in the future. > > However, KVM does not track how guest pages are used, which means that SGX > virtualization use of EREMOVE might fail. Specifically, it is > legitimate that EREMOVE returns SGX_CHILD_PRESENT for EPC assigned to > KVM guest, because KVM/kernel doesn't track SECS pages. > > To allow SGX/KVM to introduce a more permissive EREMOVE helper and to > let the SGX virtualization code use the allocator directly, > break out the EREMOVE call from the SGX page allocator. Rename the > original sgx_free_epc_page() to sgx_encl_free_epc_page(), > indicating that it is used to free EPC page assigned host enclave. > Replace sgx_free_epc_page() with sgx_encl_free_epc_page() in all call > sites so there's no functional change. > > At the same time improve error message when EREMOVE fails, and add > documentation to explain to user what is the bug and suggest user what > to do when this bug happens, although extremely unlikely. Thanks :) > > > +Although extremely unlikely, EPC leaks can happen if kernel SGX bug happens, > > +when a WARNING with below message is shown in dmesg: > > Remove "Although extremely unlikely". Will do. > > > +"...EREMOVE returned ..., kernel bug likely. EPC page leaked, SGX may become > > +unusuable. Please refer to Documentation/x86/sgx.rst for more information." > > + > > +This is effectively a kernel use-after-free of EPC, and due to the way SGX > > +works, the bug is detected at freeing. Rather than add the page back to the pool > > +of available EPC, the kernel intentionally leaks the page to avoid additional > > +errors in the future. > > + > > +When this happens, kernel will likely soon leak majority of EPC pages, and SGX > > +will likely become unusable. However while this may be fatal to SGX, other > > +kernel functionalities are unlikely to be impacted, and should continue to work. > > + > > +As a result, when this happpens, user should stop running any new SGX workloads, > > +(or just any new workloads), and migrate all valuable workloads, for instance, > > +virtual machines, to other places. > > Remove everything starting with "for instance". Will do. > > Although a machine reboot can recover all > > +EPC, debugging and fixing this bug is appreciated. > > Replace the second part with "the bug should be reported to the Linux developers". > The poor user is not expected to debug SGX. ;) Hmm.. Makes sense :) > > > +/* Error message for EREMOVE failure, when kernel is about to leak EPC page */ > > +#define EREMOVE_ERROR_MESSAGE \ > > + "EREMOVE returned %d (0x%x), kernel bug likely. EPC page leaked, SGX may become > > unusuable. Please refer to Documentation/x86/sgx.rst for more information." > > Rewritten: > > EREMOVE returned %d and an EPC page was leaked; SGX may become unusable. > This is a kernel bug, refer to Documentation/x86/sgx.rst for more information. Fine to me, although this would have %d (0x%x) -> %d change in the code. > > Also please split it across multiple lines. > > Paolo >