Re: [PATCH] drm/amdkfd: Fix the shift-out-of-bounds warning

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[+Jon]

On 2024-01-11 01:05, Ma, Jun wrote:

Hi Felix,

On 1/10/2024 11:57 PM, Felix Kuehling wrote:
On 2024-01-10 04:39, Ma Jun wrote:
There is following shift-out-of-bounds warning if ecode=0.
"shift exponent 4294967295 is too large for 64-bit type 'long long unsigned int'"

Signed-off-by: Ma Jun <Jun.Ma2@xxxxxxx>
---
   include/uapi/linux/kfd_ioctl.h | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index 2aa88afe305b..129325b02a91 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -1004,7 +1004,7 @@ enum kfd_dbg_trap_exception_code {
   };
/* Mask generated by ecode in kfd_dbg_trap_exception_code */
-#define KFD_EC_MASK(ecode)	(1ULL << (ecode - 1))
+#define KFD_EC_MASK(ecode)	(BIT(ecode) - 1)
This is not the same thing. We want a bit mask with one bit set. And
ecode=1 should set bit 0. ecode=0 is not a valid code and doesn't have a
valid mask. You could use BIT((ecode) - 1), but I think that would give
you the same warning for ecode=0. I also don't see BIT defined anywhere
under include/uapi, so I think using this in the API header would break
the build in user mode.

Where are you seeing the warning about the bad shift exponent? Looks
like someone is using the KFD_EC_MASK macro incorrectly. Or if there is
a legitimate use of it with ecode=0, then the correct fix would be

This warning is caused by following code in function event_interrupt_wq_v10()
		
else if (source_id == SOC15_INTSRC_CP_BAD_OPCODE) {
	kfd_set_dbg_ev_from_interrupt(dev, pasid,
	KFD_DEBUG_DOORBELL_ID(context_id0),
	KFD_EC_MASK(KFD_DEBUG_CP_BAD_OP_ECODE(context_id0)),
	NULL,
	0);
}

This looks OK. The compiler must be warning about a potential problem here, not a definite one.

Question for Jon, how does the firmware encode the error code in the context ID? I see these macros:

#define KFD_DEBUG_CP_BAD_OP_ECODE_MASK          0x3fffc00
#define KFD_DEBUG_CP_BAD_OP_ECODE_SHIFT         10
#define KFD_DEBUG_CP_BAD_OP_ECODE(ctxid0) (((ctxid0) &                  \
                                KFD_DEBUG_CP_BAD_OP_ECODE_MASK)         \
                                >> KFD_DEBUG_CP_BAD_OP_ECODE_SHIFT)

It looks like we have 16 bits for the ECODE. That's enough to have a bit mask. Do we really need KFD_EC_MASK to convert an error number into a bitmask here?




#define KFD_EC_MASK(ecode)	((ecode) ? 1ULL << (ecode - 1) : 0ULL)
This can fix the warning.

In the code above, if ecode is 0, that would lead to calling kfd_set_dbg_ev_from_interrupt with a event mask of 0. Not sure if that even makes sense. Jon, so we need special handling of cases where the error code is 0 or out of range, so we can warn about buggy firmware rather than creating nonsensical events for the debugger?

Regards,
  Felix



Regards
Ma Jun
Regards,
    Felix


/* Masks for exception code type checks below */
   #define KFD_EC_MASK_QUEUE	(KFD_EC_MASK(EC_QUEUE_WAVE_ABORT) |	\



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux