On Wed, Aug 31, 2022 at 05:57:22AM +0300, jarkko@xxxxxxxxxx wrote: > On Wed, Aug 31, 2022 at 02:55:52AM +0000, Huang, Kai wrote: > > On Wed, 2022-08-31 at 05:44 +0300, jarkko@xxxxxxxxxx wrote: > > > On Wed, Aug 31, 2022 at 02:35:53AM +0000, Huang, Kai wrote: > > > > On Wed, 2022-08-31 at 05:15 +0300, jarkko@xxxxxxxxxx wrote: > > > > > On Wed, Aug 31, 2022 at 01:27:58AM +0000, Huang, Kai wrote: > > > > > > On Tue, 2022-08-30 at 15:54 -0700, Reinette Chatre wrote: > > > > > > > Hi Jarkko, > > > > > > > > > > > > > > On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote: > > > > > > > > In sgx_init(), if misc_register() for the provision device fails, and > > > > > > > > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be > > > > > > > > prematurely stopped. > > > > > > > > > > > > > > I do not think misc_register() is required to fail for the scenario to > > > > > > > be triggered (rather use "or" than "and"?). Perhaps just > > > > > > > "In sgx_init(), if a failure is encountered after ksgxd is started > > > > > > > (via sgx_page_reclaimer_init()) ...". > > > > > > > > > > > > IMHO "a failure" might be too vague. For instance, failure to sgx_drv_init() > > > > > > won't immediately result in ksgxd to stop prematurally. As long as KVM SGX can > > > > > > be initialized successfully, sgx_init() still returns 0. > > > > > > > > > > > > Btw I was thinking whether we should move sgx_page_reclaimer_init() to the end > > > > > > of sgx_init(), after we make sure at least one of the driver and the KVM SGX is > > > > > > initialized successfully. Then the code change in this patch won't be necessary > > > > > > if I understand correctly. AFAICT there's no good reason to start the ksgxd at > > > > > > early stage before we are sure either the driver or KVM SGX will work. > > > > > > > > > > I would focus fixing the existing flow rather than reinventing the flow. > > > > > > > > > > It can be made to work, and therefore it is IMHO correct action to take. > > > > > > > > From another perspective, the *existing flow* is the reason which causes this > > > > bug. A real fix is to fix the flow itself. > > > > > > Any existing flow in part of the kernel can have a bug. That > > > does not mean that switching flow would be proper way to fix > > > a bug. > > > > > > BR, Jarkko > > > > Yes but I think this is only true when the flow is reasonable. If the flow > > itself isn't reasonable, we should fix the flow (given it's easy to fix AFAICT). > > > > Anyway, let us also hear from others. > > The flow can be made to work without issues, which in the > context of a bug fix is exactly what a bug fix should do. > Not more or less. > > You don't gain any measurable value for the user with this > switch idea. And besides this not proper way to review patch anyway because you did not review the code. I'll focus on fix what is broken e.g. so that it is easy to backport to stable and distro kernels, and call it a day. It certainly does not have to make code "perfect", as long as known bugs are sorted out. You are welcome to review the next version of the patch, once I've resolved the issues that were pointed out by Reinette, if you still see some issue but this type of speculative discussion is frankly just wasting everyones time. (need to check my mutt config, do not know why it is not always putting real name to from field) BR, Jarkko