On Mon, Jun 29, 2020 at 10:46 PM Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > > On Sun, Jun 28, 2020 at 10:09 AM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote: > > > > On Fri, Jun 26, 2020 at 8:41 PM Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > > > > > > On Fri, Jun 26, 2020 at 10:34 AM Rafael J. Wysocki <rjw@xxxxxxxxxxxxx> wrote: > > > > > > > > Hi All, > > > > > > > > On Monday, June 22, 2020 3:50:42 PM CEST Rafael J. Wysocki wrote: > > > > > Hi All, > > > > > > > > > > This series is to address the problem with RCU synchronization occurring, > > > > > possibly relatively often, inside of acpi_ex_system_memory_space_handler(), > > > > > when the namespace and interpreter mutexes are held. > > > > > > > > > > Like I said before, I had decided to change the approach used in the previous > > > > > iteration of this series and to allow the unmap operations carried out by > > > > > acpi_ex_system_memory_space_handler() to be deferred in the first place, > > > > > which is done in patches [1-2/4]. > > > > > > > > In the meantime I realized that calling syncrhonize_rcu_expedited() under the > > > > "tables" mutex within ACPICA is not quite a good idea too and that there is no > > > > reason for any users of acpi_os_unmap_memory() in the tree to use the "sync" > > > > variant of unmapping. > > > > > > > > So, unless I'm missing something, acpi_os_unmap_memory() can be changed to > > > > always defer the final unmapping and the only ACPICA change needed to support > > > > that is the addition of the acpi_os_release_unused_mappings() call to get rid > > > > of the unused mappings when leaving the interpreter (module the extra call in > > > > the debug code for consistency). > > > > > > > > So patches [1-2/4] have been changed accordingly. > > > > > > > > > However, it turns out that the "fast-path" mapping is still useful on top of > > > > > the above to reduce the number of ioremap-iounmap cycles for the same address > > > > > range and so it is introduced by patches [3-4/4]. > > > > > > > > Patches [3-4/4] still do what they did, but they have been simplified a bit > > > > after rebasing on top of the new [1-2/4]. > > > > > > > > The below information is still valid, but it applies to the v3, of course. > > > > > > > > > For details, please refer to the patch changelogs. > > > > > > > > > > The series is available from the git branch at > > > > > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \ > > > > > acpica-osl > > > > > > > > > > for easier testing. > > > > > > > > Also the series have been tested locally. > > > > > > Ok, I'm still trying to get the original reporter to confirm this > > > reduces the execution time for ASL routines with a lot of OpRegion > > > touches. Shall I rebuild that test kernel with these changes, or are > > > the results from the original RFT still interesting? > > > > I'm mostly interested in the results with the v3 applied. > > > > Ok, I just got feedback on v2 and it still showed the 30 minute > execution time where 7 minutes was achieved previously. This probably means that "transient" memory opregions, which appear and go away during the AML execution, are involved and so moving the RCU synchronization outside of the interpreter and namespace locks is not enough to cover this case. It should be covered by the v4 (https://lore.kernel.org/linux-acpi/1666722.UopIai5n7p@kreacher/T/#u), though, because the unmapping is completely asynchronous in there and it doesn't add any significant latency to the interpreter exit path. So I would expect to see much better results with the v4, so I'd recommend testing this one next. > > Also it would be good to check the impact of the first two patches > > alone relative to all four. > > I'll start with the full set and see if they can also support the > "first 2" experiment. In the v4 there are just two patches, so it should be straightforward enough to test with and without the top-most one. :-)