On Tue, Jun 4, 2024 at 8:41 AM Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> wrote: > > On Thu, May 30, 2024 at 10:36:57PM -0700, Chia-I Wu wrote: > > We can skip children resources when the parent resource does not cover > > the range. > > > > This should help vmf_insert_* users on x86, such as several DRM drivers. > > On my AMD Ryzen 5 7520C, when streaming data from cpu memory into amdgpu > > bo, the throughput goes from 5.1GB/s to 6.6GB/s. perf report says > > > > 34.69%--__do_fault > > 34.60%--amdgpu_gem_fault > > 34.00%--ttm_bo_vm_fault_reserved > > 32.95%--vmf_insert_pfn_prot > > 25.89%--track_pfn_insert > > 24.35%--lookup_memtype > > 21.77%--pat_pagerange_is_ram > > 20.80%--walk_system_ram_range > > 17.42%--find_next_iomem_res > > > > before this change, and > > > > 26.67%--__do_fault > > 26.57%--amdgpu_gem_fault > > 25.83%--ttm_bo_vm_fault_reserved > > 24.40%--vmf_insert_pfn_prot > > 14.30%--track_pfn_insert > > 12.20%--lookup_memtype > > 9.34%--pat_pagerange_is_ram > > 8.22%--walk_system_ram_range > > 5.09%--find_next_iomem_res > > > > after. > > That's great, but why is walk_system_ram_range() being called so often? > > Shouldn't that be a "set up the device" only type of thing? Why hammer > on "lookup_memtype" when you know the memtype, you just did the same > thing for the previous frame. > > This feels like it could be optimized to just "don't call these things" > which would make it go faster, right? > > What am I missing here, why does this always have to be calculated all > the time? Resource mapping changes are rare, if at all, over the > lifetime of a system uptime. Constantly calculating something that > never changes feels odd to me. Yeah, that would be even better. I am not familiar with x86 pat code. I will have to defer that to those more familiar with the matter. > > thanks, > > greg k-h