> > Presently, efi_runtime_services() are executed by firmware in process > > context. To execute efi_runtime_service(), kernel switches the page > > directory from swapper_pgd to efi_pgd. However, efi_pgd doesn't have > > any user space mappings. A potential issue could be, for instance, an > > NMI interrupt (like perf) trying to profile some user data while in efi_pgd. > > > > A solution for this issue could be to use kthread to run > > efi_runtime_service(). When a user/kernel thread requests to execute > > efi_runtime_service(), kernel off-loads this work to kthread which in > > turn uses efi_pgd. Anything that tries to touch user space addresses > > while in kthread is terminally broken. This patch adds support to efi > > subsystem to handle all calls to efi_runtime_services() using a work > > queue (which in turn uses kthread). > > > > Implementation summary: > > ----------------------- > > 1. When user/kernel thread requests to execute efi_runtime_service(), > > enqueue work to efi_rts_workqueue. > > 2. Caller thread waits until the work is finished because it's > > dependent on the return status of efi_runtime_service(). > > > > Semantics to pack arguments in efi_runtime_work (has void pointers): > > 1. If argument is a pointer (of any type), pass it as is. > > 2. If argument is a value (of any type), address of the value is > > passed. > > > > Introduce a handler function (called efi_call_rts()) that > > a. understands efi_runtime_work and > > b. invokes the appropriate efi_runtime_service() with the > > appropriate arguments > > > > Semantics followed by efi_call_rts() to understand efi_runtime_work: > > 1. If argument was a pointer, recast it from void pointer to original > > pointer type. > > 2. If argument was a value, recast it from void pointer to original > > pointer type and dereference it. > > > > pstore writes could potentially be invoked in interrupt context and it > > uses set_variable<>() and query_variable_info<>() to store logs. If we > > invoke efi_runtime_services() through efi_rts_wq while in atomic() > > kernel issues a warning ("scheduling wile in atomic") and prints stack > > trace. One way to overcome this is to not make the caller process wait > > for the worker thread to finish. This approach breaks pstore i.e. the > > log messages aren't written to efi variables. Hence, pstore calls > > efi_runtime_services() without using efi_rts_wq or in other words > > efi_rts_wq will be used unconditionally for all the > > efi_runtime_services() except set_variable<>() and > > query_variable_info<>() > > > Is there a place in the system reboot path where we can try to flush these > asynchronous pstore writes from interrupt context? I don't think so because, the issue is not with the pstore writes but with pstore using efi as backing store. Anything could register as pstore backend, eg: RAM, ACPI-ERST etc.. and AFAIK, they don’t use work queues to store logs. Now that efi_runtime_services() uses work queues, we unfortunately have to have this hack. > It seems unfortunate that > we need to have this wide exception for all > set_variable() calls. True, basically any efi_runtime_service() that might get called in interrupt context. I am not very happy to have the hack too, but didn’t find other way. Either that or switch to an explicit "emergency mode" where > we stop caring about protecting the system from EFI runtime code because > we're already crashing. Should we care about extra warning (scheduling while in atomic) when we are already crashing? This sounds kind of debatable. I will wait for feedback from community it they think it's OK or maybe a better solution. Regards, Sai ��.n��������+%������w��{.n�����{����*jg��������ݢj����G�������j:+v���w�m������w�������h�����٥