On Sat, 2021-03-06 at 03:55 +0000, Jacob, Anson wrote: [AMD Public Use]
Thanks Lyude for testing the patch.
Are you referring to this issue [1] ?
Is it reproducible after applying this patch as well ?
Yes I am - and yeah, if you're talking about the patch you originally asked me to try then yes- I'm still able to reproduce it with that patch applied
From: Lyude Paul <lyude@xxxxxxxxxx> Sent: Friday, March 5, 2021 6:08 PM To: Jacob, Anson <Anson.Jacob@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx <amd-gfx@xxxxxxxxxxxxxxxxxxxxx> Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Kuehling, Felix <Felix.Kuehling@xxxxxxx> Subject: Re: [PATCH] drm/amdkfd: Fix UBSAN shift-out-of-bounds warning Tested-by: Lyude Paul <lyude@xxxxxxxxxx> That just leaves the KASAN error from read_indirect_azalia_reg, thanks for the fix! On Thu, 2021-03-04 at 15:08 -0500, Anson Jacob wrote: > If get_num_sdma_queues or get_num_xgmi_sdma_queues is 0, we end up > doing a shift operation where the number of bits shifted equals > number of bits in the operand. This behaviour is undefined. > > Set num_sdma_queues or num_xgmi_sdma_queues to ULLONG_MAX, if the > count is >= number of bits in the operand. > > Bug: https://nam11.safelinks.protection.outlook.com/?url=""> > Reported-by: Lyude Paul <lyude@xxxxxxxxxx> > Signed-off-by: Anson Jacob <Anson.Jacob@xxxxxxx> > Reviewed-by: Alex Deucher <alexander.deucher@xxxxxxx> > Reviewed-by: Felix Kuehling <Felix.Kuehling@xxxxxxx> > --- > .../drm/amd/amdkfd/kfd_device_queue_manager.c | 17 +++++++++++++++-- > 1 file changed, 15 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > index c37e9c4b1fb4..e7a3c496237f 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > @@ -1128,6 +1128,9 @@ static int set_sched_resources(struct > device_queue_manager *dqm) > > static int initialize_cpsch(struct device_queue_manager *dqm) > { > + uint64_t num_sdma_queues; > + uint64_t num_xgmi_sdma_queues; > + > pr_debug("num of pipes: %d\n", get_pipes_per_mec(dqm)); > > mutex_init(&dqm->lock_hidden); > @@ -1136,8 +1139,18 @@ static int initialize_cpsch(struct device_queue_manager > *dqm) > dqm->active_cp_queue_count = 0; > dqm->gws_queue_count = 0; > dqm->active_runlist = false; > - dqm->sdma_bitmap = ~0ULL >> (64 - get_num_sdma_queues(dqm)); > - dqm->xgmi_sdma_bitmap = ~0ULL >> (64 - get_num_xgmi_sdma_queues(dqm)); > + > + num_sdma_queues = get_num_sdma_queues(dqm); > + if (num_sdma_queues >= BITS_PER_TYPE(dqm->sdma_bitmap)) > + dqm->sdma_bitmap = ULLONG_MAX; > + else > + dqm->sdma_bitmap = (BIT_ULL(num_sdma_queues) - 1); > + > + num_xgmi_sdma_queues = get_num_xgmi_sdma_queues(dqm); > + if (num_xgmi_sdma_queues >= BITS_PER_TYPE(dqm->xgmi_sdma_bitmap)) > + dqm->xgmi_sdma_bitmap = ULLONG_MAX; > + else > + dqm->xgmi_sdma_bitmap = (BIT_ULL(num_xgmi_sdma_queues) - 1); > > INIT_WORK(&dqm->hw_exception_work, kfd_process_hw_exception); >
--
Sincerely, Lyude Paul (she/her) Software Engineer at Red Hat
Note: I deal with a lot of emails and have a lot of bugs on my plate. If you've asked me a question, are waiting for a review/merge on a patch, etc. and I haven't responded in a while, please feel free to send me another email to check on my status. I don't bite!
|