Re: [RFC] Heterogeneous memory management (mirror process address space on a device mmu).

Jerome Glisse <j.glisse@xxxxxxxxx> · Tue, 6 May 2014 13:28:55 -0400

On Tue, May 06, 2014 at 09:32:16AM -0700, Linus Torvalds wrote:
> On Tue, May 6, 2014 at 9:18 AM, Jerome Glisse <j.glisse@xxxxxxxxx> wrote:
> >
> > I do understand that i was pointing out that if i move to, tlb which i
> > am fine with, i will still need to sleep there.
> 
> No can do. The TLB flushing itself is called with a spinlock held, and
> we need to continue to do that.
> 
> Why do you really need to sleep? Because that sounds bogus.
> 
> What you *should* do is send the flush message, and not wait for any
> reply. You can then possibly wait for the result later on: we already
> have this multi-stage TLB flush model (using the "mmu_gather"
> structure) that has three phases:
> 
>  - create mmu_gather (allocate space for batching etc). This can sleep.
>  - do the actual flushing (possibly multiple times). This is the
> "synchronous with the VM" part and cannot sleep.
>  - tear down the mmu_gather data structures and actually free the
> pages we batched. This can sleep.
> 
> and what I think a GPU flush has to do is to do the actual flushes
> when asked to (because that's what it will need to do to work with a
> real TLB eventually), but if there's some crazy asynchronous
> acknowledge thing from hardware, it's possible to perhaps wait for
> that in the final phase (*before* we free the pages we gathered).
> 
> Now, such an asynchronous model had better not mark page tables dirty
> after we flushed (we'd lose that information), but quite frankly,
> anything that is remote enough to need some async flush thing cannor
> sanely be close enough to be closely tied to the actual real page
> tables, so I don't think we need to care.
> 
> Anyway, I really think that the existing mmu_gather model *should*
> work fine for this all. It may be a bit inconvenient for crazy
> hardware, but the important part is that it definitely should work for
> any future hardware that actually gets this right.
> 
> It does likely involve adding some kind of device list to "struct
> mm_struct", and I'm sure there's some extension to "struct mmu_gather"
> too, but _conceptually_ it should all be reasonably non-invasive.
> 
> Knock wood.
> 
>             Linus

Also, just to be sure, are my changes to the radix tree otherwise
acceptable at least in principle. As explained in my long email
when migrating file backed page to device memory we want to make
sure that no one else try to use those pages.

The way i have done it is described in my long email but in a nutshell
it use special swap entry inside the radix tree and have filesystem
code knows about that and call into the hmm to migrate back to memory
if needed. Writeback use a temporary bounce page (ie once writeback is
done the gpu page can be remapped writeable and the bounce page forgotten).

Cheers,
Jérôme

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>