On Sat, Aug 12, 2017 at 12:56 AM, Felix Kuehling <Felix.Kuehling at amd.com> wrote: > From: Jay Cornwall <Jay.Cornwall at amd.com> > > Gfx8 HW incorrectly clamps CP_HQD_EOP_CONTROL.EOP_SIZE, which can > lead to scheduling deadlock due to SE EOP done counter overflow. > > Enforce a EOP queue size limit which prevents the CP from sending > more than 0xFF events at a time. > > Signed-off-by: Jay Cornwall <Jay.Cornwall at amd.com> > Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com> > --- > drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c | 11 +++++++++-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c > index f4c8c23..98a930e 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c > @@ -135,8 +135,15 @@ static int __update_mqd(struct mqd_manager *mm, void *mqd, > 3 << CP_HQD_IB_CONTROL__MIN_IB_AVAIL_SIZE__SHIFT | > mtype << CP_HQD_IB_CONTROL__MTYPE__SHIFT; > > - m->cp_hqd_eop_control |= > - ffs(q->eop_ring_buffer_size / sizeof(unsigned int)) - 1 - 1; > + /* > + * HW does not clamp this field correctly. Maximum EOP queue size > + * is constrained by per-SE EOP done signal count, which is 8-bit. > + * Limit is 0xFF EOP entries (= 0x7F8 dwords). CP will not submit > + * more than (EOP entry count - 1) so a queue size of 0x800 dwords > + * is safe, giving a maximum field value of 0xA. > + */ > + m->cp_hqd_eop_control |= min(0xA, > + ffs(q->eop_ring_buffer_size / sizeof(unsigned int)) - 1 - 1); > m->cp_hqd_eop_base_addr_lo = > lower_32_bits(q->eop_ring_buffer_address >> 8); > m->cp_hqd_eop_base_addr_hi = > -- > 2.7.4 > This patch is: Acked-by: Oded Gabbay <oded.gabbay at gmail.com>