Dan Williams <dan.j.williams@xxxxxxxxx> writes: > D Scott Phillips wrote: >> On arm64 prior to commit 32697ff38287 ("arm64: vmemmap: Avoid base2 order >> of struct page size to dimension region"), the amdgpu driver could trip >> over the warning of: >> >> `WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));` >> >> in vmemmap_populate()[1]. After that commit, it becomes a translation fault >> and panic[2]. >> >> The cause is that the amdgpu driver allocates some unused space from >> iomem_resource and claims it as MEMORY_DEVICE_PRIVATE and >> devm_memremap_pages() it. An address above those backed by the arm64 >> vmemmap is picked. >> >> Limit request_free_mem_region() so that only addresses within the >> arch_get_mappable_range() can be chosen as device private addresses. > > It seems odd that devm_request_free_mem_region() needs to be careful > about this restriction. The caller passes in the resource tree that is > the bounds of valid address ranges. This change assumes that the caller > wants to be restricted to vmemmap capable address ranges beyond the > restrictions it already requested in the passed in @base argument. That > restriction may be true with respect to request_mem_region(), but not > necessarily other users of get_free_mem_region() like > alloc_free_mem_region(). > > So, 2 questions / change request options: > > 1/ Preferred: Is there a possibility for the AMD driver to trim the > resource it is passing to be bound by arch_get_mappable_range()? For CXL > this is achieved by inserting CXL aperture windows into the resource > tree. > > In the future what happens in the MEMORY_DEVICE_PUBLIC case when the > memory address is picked by a hardware aperture on the device? It occurs > to me if that aperture is communicated to the device via some platform > mechanism (to honor arch_get_mappable_range() restrictions), then maybe > the same should be done here. > > I have always cringed at the request_free_mem_region() implementation > playing fast and loose with the platform memory map. Maybe this episode > is a sign that these constraints need more formal handling in the > resource tree. > > I.e. IORES_DESC_DEVICE_PRIVATE_MEMORY becomes a platform communicated > aperture rather than hoping that unused portions of iomem_resource can > be repurposed like this. Hi Dan, sorry for my incredibly delayed response, I lost your message to a filter on my end :( I'm happy to work toward your preferred approach here, though I'm not sure I know how to achieve it. I think I understand how cxl is keeping device_private_memory out, but I don't think I understand the resource system well enough to see how amdgpu can make a properly trimmed resource for request_free_mem_region. My novice attempt would be something like: diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index 8ee3d07ffbdfa..d84de6d66ac45 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c @@ -1038,7 +1039,14 @@ int kgd2kfd_init_zone_device(struct amdgpu_device *adev) pgmap->range.end = adev->gmc.aper_base + adev->gmc.aper_size - 1; pgmap->type = MEMORY_DEVICE_COHERENT; } else { - res = devm_request_free_mem_region(adev->dev, &iomem_resource, size); + struct range mappable; + struct resource root; + + mappable = arch_get_mappable_range(); + root.start = mappable.start; + root.end = mappable.end; + root.child = iomem_resource.child; + res = devm_request_free_mem_region(adev->dev, &root, size); if (IS_ERR(res)) return PTR_ERR(res); pgmap->range.start = res->start; Apart from this being wrong with respect to resource_lock, is that sort of the idea? or am I missing the sensible way to hoist the vmemmap range into iomem_resource? or maybe I'm just totally off in the weeds.