Hi, Rafael > From: rjwysocki@xxxxxxxxx [mailto:rjwysocki@xxxxxxxxx] On Behalf Of Rafael J. Wysocki > Subject: Re: [PATCH v2] ACPI / OSL: Fix rcu synchronization logic > > On Wed, Jan 11, 2017 at 10:54 AM, Lv Zheng <lv.zheng@xxxxxxxxx> wrote: > > The rcu synchronization logic is originally provided to protect > > apei_read()/apei_write() as in the APEI drivers, there is NMI event source > > requiring non spinlock based synchronization mechanism. After that, ACPI > > developers think FADT registers may also require same facility, so they > > moved the RCU stuffs to generic ACPI layer. So now interrupt (irq, nmi) > > context ACPI map lookup is only protected by RCU. > > > > This triggers a kernel hang when ACPICA API starts to unmap unused tables > > (see Link #1 for details). The cause of the hang is > > drivers/iommu/amd_iommu_init.c calling acpi_put_table() too early and then > > land in RCU which is uninitialized yet: > > early_amd_iommu_init() > > acpi_put_table(ivrs_base) > > acpi_os_unmap_memory() > > synchronize_rcu_expedited() > > Now that function goes and sends IPIs, i.e., schedule_work() but this is > > too early - workqueue_init() hasn't been invoked. Actually, from looking > > at the callstack, we do kernel_init_freeable()->native_smp_prepare_cpus() > > and workqueue_init() comes next. > > > > Actually this facility is only used to protect interrupt context ACPI map > > lookup, and such mappings are only introduced by > > acpi_os_map_generic_address(). So before it is invoked, there is no need to > > use RCU, mutex should be used instead of. > > Why should the mutex be used then? > > There's nothing wrong with using RCU in all cases except when that > happens early on boot when things don't work as expected, but at that > time there's no need to do the synchronization *at* *all* as the list > protected by RCU is not even walked. If the issue can be fixed in a point that is later than the early_amd_iommu_init() and earlier than all acpi_os_read/write_memory(). That's great as it means the issue can be fixed in the simplest way. If not (means there is always a chance acpi_os_read/write_memory() can happen before early_amd_iommu_init() or other initcalls which invoke acpi_put_table()), we then need this workaround. If all acpi_os_read/write_memory() are invoked from the task context, then we can just use the mutex_lock() instead of rcu_read_lock() during this stage. As all acpi ioremaps are accessed in the task context. So with the mutex mechanism working during this stage, the WARN_ON_ONCE() you've introduced into acpi_os_read/write_memory() actually can be converted into WARN_ON_ONCE(!early_enough && in_interrupt()). And the early_enough flag can be determined by the first acpi_os_map_generic_address() call. The bit more complicated workaround is then generated after wrapping the rcu_read_lock()/mutex_lock() using acpi_lock_ioremap() and moving the WARN_ON_ONCE() into it. Which IMO can handle more cases than the simplest fix. And can make us immune to the future similar issues as: 1. The ACPI subsystem now can control the order itself by ensuring all acpi_os_map_generic_address() callers are late enough, we'll not be messed up by the external initcall orders. 2. Some ACPI initialization steps (for example, namespace initialization) can be changed without the worry to trigger similar regressions. Thanks and best regards Lv ��.n��������+%������w��{.n�����{�����ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f