[AMD Official Use Only - General] -----邮件原件----- 发件人: Paneer Selvam, Arunpravin <Arunpravin.PaneerSelvam@xxxxxxx> 发送时间: 2022年11月24日 22:57 收件人: Christian König <ckoenig.leichtzumerken@xxxxxxxxx>; Wang, Yang(Kevin) <KevinYang.Wang@xxxxxxx>; Koenig, Christian <Christian.Koenig@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx 抄送: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Zhang, Hawking <Hawking.Zhang@xxxxxxx> 主题: Re: [PATCH] drm/amdgpu: add amdgpu vram usage information into amdgpu_vram_mm On 11/24/2022 1:17 PM, Christian König wrote: > Am 24.11.22 um 08:45 schrieb Wang, Yang(Kevin): >> [AMD Official Use Only - General] >> >> -----Original Message----- >> From: Koenig, Christian <Christian.Koenig@xxxxxxx> >> Sent: Thursday, November 24, 2022 3:25 PM >> To: Wang, Yang(Kevin) <KevinYang.Wang@xxxxxxx>; >> amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Paneer Selvam, Arunpravin >> <Arunpravin.PaneerSelvam@xxxxxxx> >> Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Deucher, Alexander >> <Alexander.Deucher@xxxxxxx> >> Subject: Re: [PATCH] drm/amdgpu: add amdgpu vram usage information >> into amdgpu_vram_mm >> >> Am 24.11.22 um 06:49 schrieb Yang Wang: >>> add vram usage information into dri debugfs amdgpu_vram_mm node. >>> >>> Background: >>> when amdgpu driver introduces drm buddy allocator, the kernel driver >>> (and developer) is difficult to track the vram usage information. >>> >>> Field: >>> 0xaaaaaaaa-0xbbbbbbbb: vram usaged range. >>> type: kernel, device, sg >>> usage: normal, vm, user >>> domain: C-CPU, G-GTT, V-VRAM, P-PRIV >>> @xxxxx: the address of "amdgpu_bo" object in kernel space. >>> 4096: vram range range. >>> >>> Example: >>> 0x00000003fea68000-0x00000003fea68fff: (type:kernel usage:vm >>> domain:--V- --V-) @000000001d33dfee 4096 bytes >>> 0x00000003fea69000-0x00000003fea69fff: (type:kernel usage:vm >>> domain:--V- --V-) @00000000a79155b5 4096 bytes >>> 0x00000003fea6b000-0x00000003fea6bfff: (type:kernel usage:vm >>> domain:--V- --V-) @0000000038ad633b 4096 bytes >>> 0x00000003fea6c000-0x00000003fea6cfff: (type:device usage:user >>> domain:--V- --V-) @00000000e302f90b 4096 bytes >>> 0x00000003fea6d000-0x00000003fea6dfff: (type:device usage:user >>> domain:--V- --V-) @00000000e664c172 4096 bytes >>> 0x00000003fea6e000-0x00000003fea6efff: (type:kernel usage:vm >>> domain:--V- --V-) @000000004528cb2f 4096 bytes >>> 0x00000003fea6f000-0x00000003fea6ffff: (type:kernel usage:vm >>> domain:--V- --V-) @00000000a446bdbf 4096 bytes >>> 0x00000003fea70000-0x00000003fea7ffff: (type:device usage:user >>> domain:--V- --V-) @0000000078fae42f 65536 bytes >>> 0x00000003fead8000-0x00000003feadbfff: (type:kernel usage:normal >>> domain:--V- --V-) @000000001327b7ff 16384 bytes >>> 0x00000003feadc000-0x00000003feadcfff: (type:kernel usage:normal >>> domain:--V- --V-) @000000001327b7ff 4096 bytes >>> 0x00000003feadd000-0x00000003feaddfff: (type:kernel usage:normal >>> domain:--V- --V-) @00000000b9706fc1 4096 bytes >>> 0x00000003feade000-0x00000003feadefff: (type:kernel usage:vm >>> domain:--V- --V-) @0000000071a25571 4096 bytes >>> >>> Note: >>> although some vram ranges can be merged in the example above, but >>> this can reflect the actual distribution of drm buddy allocator. >> Well completely NAK. This is way to much complexity for simple >> debugging. >> >> Question is what are your requirements here? E.g. what information do >> you want and why doesn't the buddy allocator already expose this? >> >> Regards, >> Christian. >> >> [Kevin]: >> >> For KMD debug. >> The DRM buddy interface doesn't provide an interface to query which >> ranges of ram(resource) are used. >> It is not easy to debug in KMD side if driver create BO fail at >> specific location. >> and from the view of KMD, the VRAM at some locations has special >> purposes. >> with this patch, we can know which range of vram are actually used. > > Well that's not a good reason to add this complexity. Debugging > doesn't justify that. > > Please work with Arun to add the necessary information to the buddy > allocator interface. > > Regards, > Christian. > Hi Kevin, I will check and list down some of the necessary information that we can add to the buddy allocator interface. Regards, Arun. [kevin]: Thanks, but some information is AMDGPU specific, so I hope we can add some flexible interfaces to adapt the different driver's extension. Best Regards, Kevin >> >> Best Regards, >> Kevin >>> Signed-off-by: Yang Wang <KevinYang.Wang@xxxxxxx> >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 +- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 3 + >>> drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 130 >>> ++++++++++++++++++- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h | 1 + >>> 4 files changed, 136 insertions(+), 4 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c >>> index 90eb07106609..117c754409b3 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c >>> @@ -53,7 +53,7 @@ >>> * >>> */ >>> >>> -static void amdgpu_bo_destroy(struct ttm_buffer_object *tbo) >>> +void amdgpu_bo_destroy(struct ttm_buffer_object *tbo) >>> { >>> struct amdgpu_bo *bo = ttm_to_amdgpu_bo(tbo); >>> >>> @@ -66,7 +66,7 @@ static void amdgpu_bo_destroy(struct >>> ttm_buffer_object *tbo) >>> kvfree(bo); >>> } >>> >>> -static void amdgpu_bo_user_destroy(struct ttm_buffer_object *tbo) >>> +void amdgpu_bo_user_destroy(struct ttm_buffer_object *tbo) >>> { >>> struct amdgpu_bo *bo = ttm_to_amdgpu_bo(tbo); >>> struct amdgpu_bo_user *ubo; >>> @@ -76,7 +76,7 @@ static void amdgpu_bo_user_destroy(struct >>> ttm_buffer_object *tbo) >>> amdgpu_bo_destroy(tbo); >>> } >>> >>> -static void amdgpu_bo_vm_destroy(struct ttm_buffer_object *tbo) >>> +void amdgpu_bo_vm_destroy(struct ttm_buffer_object *tbo) >>> { >>> struct amdgpu_device *adev = amdgpu_ttm_adev(tbo->bdev); >>> struct amdgpu_bo *bo = ttm_to_amdgpu_bo(tbo); diff --git >>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h >>> index 147b79c10cbb..3f6a687309a7 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h >>> @@ -332,6 +332,9 @@ int amdgpu_bo_restore_shadow(struct amdgpu_bo >>> *shadow, >>> struct dma_fence **fence); >>> uint32_t amdgpu_bo_get_preferred_domain(struct amdgpu_device >>> *adev, >>> uint32_t domain); >>> +void amdgpu_bo_destroy(struct ttm_buffer_object *tbo); void >>> +amdgpu_bo_user_destroy(struct ttm_buffer_object *tbo); void >>> +amdgpu_bo_vm_destroy(struct ttm_buffer_object *tbo); >>> >>> /* >>> * sub allocation >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c >>> index 27159f1d112e..165f4f1a8141 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c >>> @@ -23,9 +23,11 @@ >>> */ >>> >>> #include <linux/dma-mapping.h> >>> +#include <linux/interval_tree_generic.h> >>> #include <drm/ttm/ttm_range_manager.h> >>> >>> #include "amdgpu.h" >>> +#include "amdgpu_object.h" >>> #include "amdgpu_vm.h" >>> #include "amdgpu_res_cursor.h" >>> #include "amdgpu_atomfirmware.h" >>> @@ -38,6 +40,122 @@ struct amdgpu_vram_reservation { >>> struct list_head blocks; >>> }; >>> >>> +struct amdgpu_vram_node { >>> + struct rb_node node; >>> + u64 start; >>> + u64 last; >>> + u64 __subtree_last; >>> + struct ttm_buffer_object *tbo; }; >>> + >>> +#define START(node) ((node)->start) #define LAST(node) >>> +((node)->last) >>> + >>> +INTERVAL_TREE_DEFINE(struct amdgpu_vram_node, node, u64, >>> __subtree_last, >>> + START, LAST, static, amdgpu_vram_it) >>> + >>> +#undef START >>> +#undef LAST >>> + >>> +#define for_each_vram_mm_node(node, mgr) \ >>> + for (node = amdgpu_vram_it_iter_first(&(mgr)->root, 0, >>> U64_MAX); node; \ >>> + node = amdgpu_vram_it_iter_next(node, 0, U64_MAX)) >>> + >>> +static void amdgpu_vram_mm_add_block(struct drm_buddy_block *block, >>> +struct amdgpu_vram_mgr *mgr, struct ttm_buffer_object *tbo) { >>> + struct amdgpu_vram_node *node; >>> + >>> + node = kvzalloc(sizeof(*node), GFP_KERNEL); >>> + if (!node) >>> + return; >>> + >>> + node->start = amdgpu_vram_mgr_block_start(block); >>> + node->last = node->start + amdgpu_vram_mgr_block_size(block) - >>> +1; >>> + node->tbo = tbo; >>> + >>> + amdgpu_vram_it_insert(node, &mgr->root); } >>> + >>> +static void amdgpu_vram_mm_remove_block(struct drm_buddy_block >>> +*block, struct amdgpu_vram_mgr *mgr) { >>> + struct amdgpu_vram_node *node; >>> + u64 start, last; >>> + >>> + start = amdgpu_vram_mgr_block_start(block); >>> + last = start + amdgpu_vram_mgr_block_size(block) - 1; >>> + >>> + node = amdgpu_vram_it_iter_first(&mgr->root, start, last); >>> + if (node) { >>> + amdgpu_vram_it_remove(node, &mgr->root); >>> + kvfree(node); >>> + } >>> +} >>> + >>> +static inline const char* ttm_bo_type2str(enum ttm_bo_type type) { >>> + switch (type) { >>> + case ttm_bo_type_kernel: >>> + return "kernel"; >>> + case ttm_bo_type_device: >>> + return "device"; >>> + case ttm_bo_type_sg: >>> + return "sg"; >>> + default: >>> + return "unknow"; >>> + } >>> +} >>> + >>> +static inline const char* amdgpu_vram_domain_str(u32 domain, char >>> +*tmp) { >>> + int index = 0; >>> + >>> + tmp[index++] = domain & AMDGPU_GEM_DOMAIN_CPU ? 'C' : '-'; >>> + tmp[index++] = domain & AMDGPU_GEM_DOMAIN_GTT ? 'G' : '-'; >>> + tmp[index++] = domain & AMDGPU_GEM_DOMAIN_VRAM ? 'V' : '-'; >>> + tmp[index++] = domain & (AMDGPU_GEM_DOMAIN_GDS | >>> AMDGPU_GEM_DOMAIN_GWS | AMDGPU_GEM_DOMAIN_OA) ? >>> + 'P' : '-'; >>> + tmp[index++] = '\0'; >>> + >>> + return tmp; >>> +} >>> + >>> +static inline const char* amdgpu_vram_bo_usage(struct >>> +ttm_buffer_object *tbo) { >>> + if (tbo->destroy == &amdgpu_bo_destroy) >>> + return "normal"; >>> + else if (tbo->destroy == &amdgpu_bo_user_destroy) >>> + return "user"; >>> + else if (tbo->destroy == &amdgpu_bo_vm_destroy) >>> + return "vm"; >>> + else >>> + return "unknow"; >>> +} >>> + >>> +static void amdgpu_vram_mm_debug(struct amdgpu_vram_mgr *mgr, >>> +struct drm_printer *p) { >>> + struct amdgpu_vram_node *node; >>> + struct ttm_buffer_object *tbo; >>> + struct amdgpu_bo *abo; >>> + char tmp[5]; >>> + >>> + for_each_vram_mm_node(node, mgr) { >>> + tbo = node->tbo; >>> + abo = ttm_to_amdgpu_bo(tbo); >>> + drm_printf(p, "%#018llx-%#018llx:", node->start, >>> node->last); >>> + if (abo) >>> + drm_printf(p, " (type:%-5s usage:%-8s >>> domain:%s %s) @%p", >>> + ttm_bo_type2str(tbo->type), >>> + amdgpu_vram_bo_usage(tbo), >>> +amdgpu_vram_domain_str(abo->preferred_domains, tmp), >>> +amdgpu_vram_domain_str(abo->allowed_domains, tmp), >>> + abo); >>> + else >>> + drm_printf(p, " (reserved)"); >>> + drm_printf(p, " %llu bytes\n", >>> + node->last - node->start + 1); >>> + } >>> +} >>> + >>> static inline struct amdgpu_vram_mgr * >>> to_vram_mgr(struct ttm_resource_manager *man) >>> { >>> @@ -288,6 +406,7 @@ static void amdgpu_vram_mgr_do_reserve(struct >>> ttm_resource_manager *man) >>> dev_dbg(adev->dev, "Reservation 0x%llx - %lld, >>> Succeeded\n", >>> rsv->start, rsv->size); >>> >>> + amdgpu_vram_mm_add_block(block, mgr, NULL); >>> vis_usage = amdgpu_vram_mgr_vis_size(adev, block); >>> atomic64_add(vis_usage, &mgr->vis_usage); >>> spin_lock(&man->bdev->lru_lock); @@ -540,6 +659,8 @@ >>> static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, >>> vres->base.start = max(vres->base.start, start); >>> >>> vis_usage += amdgpu_vram_mgr_vis_size(adev, block); >>> + >>> + amdgpu_vram_mm_add_block(block, mgr, tbo); >>> } >>> >>> if (amdgpu_is_vram_mgr_blocks_contiguous(&vres->blocks)) >>> @@ -583,8 +704,10 @@ static void amdgpu_vram_mgr_del(struct >>> ttm_resource_manager *man, >>> uint64_t vis_usage = 0; >>> >>> mutex_lock(&mgr->lock); >>> - list_for_each_entry(block, &vres->blocks, link) >>> + list_for_each_entry(block, &vres->blocks, link) { >>> vis_usage += amdgpu_vram_mgr_vis_size(adev, block); >>> + amdgpu_vram_mm_remove_block(block, mgr); >>> + } >>> >>> amdgpu_vram_mgr_do_reserve(man); >>> >>> @@ -747,6 +870,9 @@ static void amdgpu_vram_mgr_debug(struct >>> ttm_resource_manager *man, >>> drm_printf(printer, "reserved:\n"); >>> list_for_each_entry(block, &mgr->reserved_pages, link) >>> drm_buddy_block_print(mm, block, printer); >>> + drm_printf(printer, "vram usage:\n"); >>> + amdgpu_vram_mm_debug(mgr, printer); >>> + >>> mutex_unlock(&mgr->lock); >>> } >>> >>> @@ -769,6 +895,8 @@ int amdgpu_vram_mgr_init(struct amdgpu_device >>> *adev) >>> struct ttm_resource_manager *man = &mgr->manager; >>> int err; >>> >>> + mgr->root = RB_ROOT_CACHED; >>> + >>> ttm_resource_manager_init(man, &adev->mman.bdev, >>> adev->gmc.real_vram_size); >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h >>> index 0e04e42cf809..a14c56e1e407 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h >>> @@ -28,6 +28,7 @@ >>> >>> struct amdgpu_vram_mgr { >>> struct ttm_resource_manager manager; >>> + struct rb_root_cached root; >>> struct drm_buddy mm; >>> /* protects access to buffer objects */ >>> struct mutex lock; >
<<attachment: winmail.dat>>