On Wed, Aug 31, 2022 at 03:28:20AM +0000, Huang, Kai wrote: > On Wed, 2022-08-31 at 06:10 +0300, Jarkko Sakkinen wrote: > > On Wed, Aug 31, 2022 at 05:57:22AM +0300, jarkko@xxxxxxxxxx wrote: > > > On Wed, Aug 31, 2022 at 02:55:52AM +0000, Huang, Kai wrote: > > > > On Wed, 2022-08-31 at 05:44 +0300, jarkko@xxxxxxxxxx wrote: > > > > > On Wed, Aug 31, 2022 at 02:35:53AM +0000, Huang, Kai wrote: > > > > > > On Wed, 2022-08-31 at 05:15 +0300, jarkko@xxxxxxxxxx wrote: > > > > > > > On Wed, Aug 31, 2022 at 01:27:58AM +0000, Huang, Kai wrote: > > > > > > > > On Tue, 2022-08-30 at 15:54 -0700, Reinette Chatre wrote: > > > > > > > > > Hi Jarkko, > > > > > > > > > > > > > > > > > > On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote: > > > > > > > > > > In sgx_init(), if misc_register() for the provision device fails, and > > > > > > > > > > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be > > > > > > > > > > prematurely stopped. > > > > > > > > > > > > > > > > > > I do not think misc_register() is required to fail for the scenario to > > > > > > > > > be triggered (rather use "or" than "and"?). Perhaps just > > > > > > > > > "In sgx_init(), if a failure is encountered after ksgxd is started > > > > > > > > > (via sgx_page_reclaimer_init()) ...". > > > > > > > > > > > > > > > > IMHO "a failure" might be too vague. For instance, failure to sgx_drv_init() > > > > > > > > won't immediately result in ksgxd to stop prematurally. As long as KVM SGX can > > > > > > > > be initialized successfully, sgx_init() still returns 0. > > > > > > > > > > > > > > > > Btw I was thinking whether we should move sgx_page_reclaimer_init() to the end > > > > > > > > of sgx_init(), after we make sure at least one of the driver and the KVM SGX is > > > > > > > > initialized successfully. Then the code change in this patch won't be necessary > > > > > > > > if I understand correctly. AFAICT there's no good reason to start the ksgxd at > > > > > > > > early stage before we are sure either the driver or KVM SGX will work. > > > > > > > > > > > > > > I would focus fixing the existing flow rather than reinventing the flow. > > > > > > > > > > > > > > It can be made to work, and therefore it is IMHO correct action to take. > > > > > > > > > > > > From another perspective, the *existing flow* is the reason which causes this > > > > > > bug. A real fix is to fix the flow itself. > > > > > > > > > > Any existing flow in part of the kernel can have a bug. That > > > > > does not mean that switching flow would be proper way to fix > > > > > a bug. > > > > > > > > > > BR, Jarkko > > > > > > > > Yes but I think this is only true when the flow is reasonable. If the flow > > > > itself isn't reasonable, we should fix the flow (given it's easy to fix AFAICT). > > > > > > > > Anyway, let us also hear from others. > > > > > > The flow can be made to work without issues, which in the > > > context of a bug fix is exactly what a bug fix should do. > > > Not more or less. > > > > > > You don't gain any measurable value for the user with this > > > switch idea. > > > > And besides this not proper way to review patch anyway because you did > > not review the code. > > > > I did review the code, but I couldn't agree on the fix. That's why I expressed > my view here. > > > > I'll focus on fix what is broken e.g. so that it > > is easy to backport to stable and distro kernels, and call it a day. > > It certainly does not have to make code "perfect", as long as known > > bugs are sorted out. > > Why cannot the fix which fixes the flow go to stable? > > > > > You are welcome to review the next version of the patch, once I've > > resolved the issues that were pointed out by Reinette, if you still > > see some issue but this type of speculative discussion is frankly just > > wasting everyones time. > > Hmm.. Why pointing out a better fix (my perspective of course) is wasting > everyone's time? There was not a single inline comment. BR, Jarkko