Re: [RFC] mm/hmm: pass mmu_notifier_range to sync_cpu_device_pagetables

Jason Gunthorpe <jgg@xxxxxxxxxxxx> · Wed, 3 Jul 2019 15:08:41 +0000

On Wed, Jul 03, 2019 at 02:27:22AM +0000, Kuehling, Felix wrote:
> On 2019-07-02 6:59 p.m., Jason Gunthorpe wrote:
> > On Wed, Jul 03, 2019 at 12:49:12AM +0200, Christoph Hellwig wrote:
> >> On Tue, Jul 02, 2019 at 07:53:23PM +0000, Jason Gunthorpe wrote:
> >>>> I'm sending this out now since we are updating many of the HMM APIs
> >>>> and I think it will be useful.
> >>> This make so much sense, I'd like to apply this in hmm.git, is there
> >>> any objection?
> >> As this creates a somewhat hairy conflict for amdgpu, wouldn't it be
> >> a better idea to wait a bit and apply it first thing for next merge
> >> window?
> > My thinking is that AMD GPU already has a monster conflict from this:
> >
> >   int hmm_range_register(struct hmm_range *range,
> > -                      struct mm_struct *mm,
> > +                      struct hmm_mirror *mirror,
> >                         unsigned long start,
> >                         unsigned long end,
> >                         unsigned page_shift);
> >
> > So, depending on how that is resolved we might want to do both API
> > changes at once.
> 
> I just sent out a fix for the hmm_mirror API change.

I think if you follow my suggestion to apply a prep patch to AMD GPU
to make the conflict resolution simple, we should defer this patch
until next kernel for the reasons CH gave.

> > Or we may have to revert the above change at this late date.
> >
> > Waiting for AMDGPU team to discuss what process they want to use.
> 
> Yeah, I'm wondering what the process is myself. With HMM and driver 
> development happening on different branches these kinds of API changes 
> are painful. There seems to be a built-in assumption in the current 
> process, that code flows mostly in one direction amd-staging-drm-next -> 
> drm-next -> linux-next -> linux. That assumption is broken with HMM code 
> evolving rapidly in both amdgpu and mm.

It looks to me like AMD GPU uses a pull request model. So a goal as a
tree runner should be to work with the other trees (ie hmm.git, etc)
to minimize conflicts between the PR you will send and the PR other
trees will send.

Do not focus on linux-next, that is just an 'early warning system'
that conflicts are on the horizon, we knew about this one :) (well,
mostly, I was surprised how big it was, my bad)

So we must stay in co-ordination with patches in-flight on the list
and make the right decision, depending on the situation. Communication
here is key :)

We have lots of strategies available to deal with these situations.

> If we want to continue developing HMM driver changes in
> amd-staging-drm-next, we'll need to synchronize with hmm.git more 
> frequently, both ways.

It can't really go both ways. hmm.git has to be only the hmm topic,
otherwise it doesn't really work.

> I believe part of the problem is, that there is a fairly long
> lead-time from getting changes from amd-staging-drm-next into
> linux-next, as they are held for one release cycle in drm-next.
> Pushing HMM-related changes through drm-fixes may offer a kind of
> shortcut. Philip and my latest fixup is just bypassing drm-next
> completely and going straight into linux-next, though.

I'm not so familiar with the DRM work flow to give you advice on this.

Jason