On Thu, Apr 8, 2021 at 6:28 AM Christian König <ckoenig.leichtzumerken@xxxxxxxxx> wrote: > > Am 08.04.21 um 09:13 schrieb Christian König: > > Am 07.04.21 um 21:04 schrieb Alex Deucher: > >> On Wed, Apr 7, 2021 at 3:23 AM Dave Airlie <airlied@xxxxxxxxx> wrote: > >>> On Wed, 7 Apr 2021 at 06:54, Alex Deucher <alexdeucher@xxxxxxxxx> > >>> wrote: > >>>> On Fri, Apr 2, 2021 at 12:22 PM Christian König > >>>> <ckoenig.leichtzumerken@xxxxxxxxx> wrote: > >>>>> Hey Alex, > >>>>> > >>>>> the TTM and scheduler changes should already be in the drm-misc-next > >>>>> branch (not 100% sure about the TTM patch, need to double check > >>>>> next week). > >>>>> > >>>> The TTM change is not in drm-misc yet. > >>>> > >>>>> Could that cause problems when both are merged into drm-next? > >>>> Dave, Daniel, how do you want to handle this? The duplicated patch > >>>> is this one: > >>>> https://cgit.freedesktop.org/drm/drm-misc/commit/?id=ac4eb83ab255de9c31184df51fd1534ba36fd212 > >>>> > >>>> amdgpu has changes which depend on it. The same patch is included > >>>> in this PR. > >>> Ouch not sure how best to sync up here, maybe get misc-next into my > >>> tree then rebase your tree on top of it? > >> I can do that. > > > > Please let me double check later today that we have everything we need > > in drm-misc-next. > > There where two patch for TTM (one from Felix and one from Oak) which > still needed to be pushed to drm-misc-next. I've done that just a minute > ago. > They were included in this PR. > > Then we have this patch which fixes a bug in code removed on > drm-misc-next. I think it should be dropped when amd-staging-drm-next is > based on drm-next/drm-misc-next. > > Author: xinhui pan <xinhui.pan@xxxxxxx> > Date: Wed Feb 24 11:28:08 2021 +0800 > > drm/ttm: Do not add non-system domain BO into swap list > Ok. > > I've also found the following patch which is problematic as well: > > commit c8a921d49443025e10794342d4433b3f29616409 > Author: Jack Zhang <Jack.Zhang1@xxxxxxx> > Date: Mon Mar 8 12:41:27 2021 +0800 > > drm/amd/amdgpu implement tdr advanced mode > > [Why] > Previous tdr design treats the first job in job_timeout as the bad job. > But sometimes a later bad compute job can block a good gfx job and > cause an unexpected gfx job timeout because gfx and compute ring share > internal GC HW mutually. > > [How] > This patch implements an advanced tdr mode.It involves an additinal > synchronous pre-resubmit step(Step0 Resubmit) before normal resubmit > step in order to find the real bad job. > > 1. At Step0 Resubmit stage, it synchronously submits and pends for the > first job being signaled. If it gets timeout, we identify it as guilty > and do hw reset. After that, we would do the normal resubmit step to > resubmit left jobs. > > 2. For whole gpu reset(vram lost), do resubmit as the old way. > > Signed-off-by: Jack Zhang <Jack.Zhang1@xxxxxxx> > Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx> > > That one is modifying both amdgpu as well as the scheduler code. IIRC I > actually requested that the patch is split into two, but that was > somehow not done. > > How should we proceed here? Should I separate the patch, push the > changes to drm-misc-next and then we merge with drm-next and rebase > amd-staging-drm-next on top of that? > > That's most likely the cleanest option approach as far as I can see. That's fine with me. We could have included them in my PR. Now we have wait for drm-misc-next to be merged again before we can merge the amdgpu code. Is anyone planning to do another drm-misc merge at this point? Alex > > Thanks, > Christian. > > > > > Regards, > > Christian. > > > >> > >> Alex > >> > >> > >>> Dave. > > > _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel