On 02/02/2024 6:04 pm, Pasha Tatashin wrote:
Hmm, I did misspeak slightly (it's late and I really should have left
this for tomorrow...) - that's 12KB per CPU *per domain*, but still that
would seem to imply well over 100 domains if you have 242MB of magazine
allocations while the iommu_iova cache isn't even on the charts... what
the heck is that driver doing?
(I don't necessarily disagree with the spirit of the patch BTW, I just
really want to understand the situation that prompted it, and make sure
we don't actually have a subtle leak somewhere.)
Hi Robin,
The following tracing is without Google TPU, simply upstream kernel:
The iova_domain_init_rcaches is called 159 with the following stack:
iova_domain_init_rcaches
iommu_setup_dma_ops
amd_iommu_probe_finalize
bus_iommu_probe
iommu_device_register
iommu_init_pci
amd_iommu_init_pci
state_next
iommu_go_to_state
amd_iommu_init
pci_iommu_init
do_one_initcall
Each time 1536K is allocated: in total 159 * 1536K = 238.5M
Yikes, so it really does just have that many IOMMU groups? OK, fair
enough, call me convinced :)
On balance though, I think I'd prefer to just stick the lifecycle
management into iova_cache_{get,put} for simplicity - spending ~256
bytes on another kmem_cache we might not use can hardly be significantly
more than the extra code and static data necessary to track its usage
separately anyway.
Thanks,
Robin.
The allocation happens like this:
for (IOVA_RANGE_CACHE_MAX_SIZE)
for_each_possible_cpu()
iova_magazine_alloc
iova_magazine_alloc
IOVA_RANGE_CACHE_MAX_SIZE = 6
ncpu = 128
sizeof (struct iova_magazine) = 1K
6 * 128 * (1K + 1K) = 1536K
Pasha