On Fri, Aug 12, 2022 at 05:10:51PM -0400, Felix Kuehling wrote: > On 2022-08-12 02:20, Dan Carpenter wrote: > > This code has two bugs. If kfd_topology_device_by_proximity_domain() > > failed on the first iteration through the loop then "cpu_link" is > > uninitialized and should not be dereferenced. > > > > The second bug is that we cannot dereference a list iterator when it > > points to the list head. In other words, if we exit the > > list_for_each_entry() loop exits without hitting a break then "cpu_link" > > is not a valid pointer and should not be dereferenced. > > > > Fix both of these problems by setting "cpu_link" to NULL when it is invalid > > and non-NULL when it is valid. That makes it easier to test for > > valid vs invalid. > > > > Fixes: 0f28cca87e9a ("drm/amdkfd: Extend KFD device topology to surface peer-to-peer links") > > Signed-off-by: Dan Carpenter <dan.carpenter@xxxxxxxxxx> > > --- > > I reported these in June but never heard back. > > I thought Ramesh implemented a fix for this: > https://lore.kernel.org/all/20220706183302.1719795-1-Ramesh.Errabolu@xxxxxxx/ > > You commented on a version of his patch: > https://lore.kernel.org/all/20220629161241.GM11460@kadam/ Oh, Sorry! I appologize for forgetting that. > > Did this get lost somehow? Anyway, your patch looks good to me and I'm going > to apply it to amd-staging-drm-next now. > > Reviewed-by: Felix Kuehling <Felix.Kuehling@xxxxxxx> Looking at Ramesh's patch now, he added a continue if kfd_topology_device_by_proximity_domain() failed. That is a behavior change, but it might also be a bug fix. My patch does not change the behavior except for eliminating the crash so I stand by my patch, but adding the continue might be a good thing to do as a separate step. I don't know the code well enough to say one way or the other. regards, dan carpenter