Re: [PATCH v6 4/5] drm: add drm_memory_stats_is_zero

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 07/11/2024 14:17, Li, Yunxiang (Teddy) wrote:
[AMD Official Use Only - AMD Internal Distribution Only]

From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxx>
Sent: Thursday, November 7, 2024 5:41
On 25/10/2024 18:41, Yunxiang Li wrote:
Add a helper to check if the memory stats is zero, this will be used
to check for memory accounting errors.

Signed-off-by: Yunxiang Li <Yunxiang.Li@xxxxxxx>
---
   drivers/gpu/drm/drm_file.c | 9 +++++++++
   include/drm/drm_file.h     | 1 +
   2 files changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index 714e42b051080..75ed701d80f74 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -859,6 +859,15 @@ static void print_size(struct drm_printer *p, const char
*stat,
     drm_printf(p, "drm-%s-%s:\t%llu%s\n", stat, region, sz, units[u]);
   }

+int drm_memory_stats_is_zero(const struct drm_memory_stats *stats) {
+   return (stats->shared == 0 &&
+           stats->private == 0 &&
+           stats->resident == 0 &&
+           stats->purgeable == 0 &&
+           stats->active == 0);
+}

Could use mem_is_zero() for some value of source/binary compactness.

Yeah, the patch set started out with that when it's just a function in amdgpu, but Christ didn't like it.

Okay, I don't feel so strongly about the implementation details.

+EXPORT_SYMBOL(drm_memory_stats_is_zero);
+

I am not a huge fan of adding this as an interface as the only caller appears to be a
sanity check in amdgpu_vm_fini():

       if (!amdgpu_vm_stats_is_zero(vm))
               dev_err(adev->dev, "VM memory stats is non-zero when fini\n");

But I guess there is some value in sanity checking since amdgpu does not have a
notion of debug only code (compiled at production and exercised via a test suite).

I do suggest to demote the dev_err to notice log level would suffice and be more
accurate.

I think it's very important to have a check like this when we have a known invariant, especially in this case where there's stat tracking code spread out everywhere and we have very little chance of catching a bug right when it happened. And since whenever this check fails we know for sure there is a bug, I don't see the harm of keeping it as an error.
It would indeed be a programming error if it can happen, but from the point of view of a driver and system log I think a warning is actually right.

Regards,

Tvrtko


Now that I think about it, I probably want to have the process & task name in here to aid in reproduction.

Teddy



[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux