Running some microbenchmarks on dax keeps showing find_next_iomem_res() as a place in which significant amount of time is spent. It appears that in order to determine the cacheability that is required for the PTE, lookup_memtype() is called, and this one traverses the resources list in an inefficient manner. This patch-set tries to improve this situation. The first patch fixes what appears to be unsafe locking in find_next_iomem_res(). The second patch improves performance by searching the top level first, to find a matching range, before going down to the children. The third patch improves the performance by caching the top level resource of the last found resource in find_next_iomem_res(). Both of these optimizations are based on the ranges in the top level not overlapping each other. Running sysbench on dax (Haswell, pmem emulation, with write_cache disabled): sysbench fileio --file-total-size=3G --file-test-mode=rndwr \ --file-io-mode=mmap --threads=4 --file-fsync-mode=fdatasync run Provides the following results: events (avg/stddev) ------------------- 5.2-rc3: 1247669.0000/16075.39 +patches: 1293408.5000/7720.69 (+3.5%) Cc: Borislav Petkov <bp@xxxxxxx> Cc: Toshi Kani <toshi.kani@xxxxxxx> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> Cc: Dan Williams <dan.j.williams@xxxxxxxxx> Cc: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> Cc: Ingo Molnar <mingo@xxxxxxxxxx> Nadav Amit (3): resource: Fix locking in find_next_iomem_res() resource: Avoid unnecessary lookups in find_next_iomem_res() resource: Introduce resource cache kernel/resource.c | 96 ++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 79 insertions(+), 17 deletions(-) -- 2.20.1