Re: [RFC] Heterogeneous memory management (mirror process address space on a device mmu).

Jerome Glisse <j.glisse@xxxxxxxxx> · Tue, 6 May 2014 11:33:17 -0400

On Tue, May 06, 2014 at 08:18:34AM -0700, Linus Torvalds wrote:
> On Tue, May 6, 2014 at 8:00 AM, Jerome Glisse <j.glisse@xxxxxxxxx> wrote:
> >
> > So question becomes how to implement process address space mirroring
> > without pinning memory and track cpu page table update knowing that
> > device page table update is unbound can not be atomic from cpu point
> > of view.
> 
> Perhaps as a fake TLB and interacting with the TLB shootdown? And
> making sure that everything is atomic?
> 
> Some of these devices are going to actually *share* the real page
> tables. Not "cache" them. Actually use the page tables directly.
> That's where all these on-die APU things are going, where the device
> really ends up being something much more like ASMP (asymmetric
> multi-processing) than a traditional external device.
> 
> So we *will* have to extend our notion of TLB shootdown to have not
> just a mask of possible active CPU's, but possible active devices. No
> question about that.

Well no, as i said and explain in my mail APU and IOMMUv2 is a one sided
coin and you can not use the device memory with such solution. So yes
there is interest from many player to mirror the cpu page table by other
means than by having the IOMMU walk the cpu page table (this include
AMD).

> 
> But doing this with sleeping in some stupid VM notifier is completely
> out of the question, because it *CANNOT EVEN WORK* for that eventual
> real goal of sharing the physical page tables where the device can do
> things like atomic dirty/accessed bit settings etc. It can only work
> for crappy sh*t that does the half-way thing. It's completely racy wrt
> the actual page table updates. That kind of page table sharing needs
> true atomicity for exactly the same reason we need it for our current
> SMP. So it needs to have all the same page table locking rules etc.
> Not that shitty notifier callback.
> 
> As I said, the VM notifiers were misdesigned to begin with. They are
> an abomination. We're not going to extend on that and make it worse.
> We are *certainly* not going to make them blocking and screwing our
> core VM that way. And that's doubly and triply true when it cannot
> work for the generic case _anyway_.
> 
>               Linus

So how can i solve the issue at hand. A device that has its own page
table and can not mirror the cpu page table, nor can the device page
table be updated atomicly from the cpu. Yes such device will exist
and the IOMMUv2 walking the cpu page table is not capable of supporting
GPU memory which is a big big big needed feature. Compare 20Gb/s vs
300Gb/s of GPU memory.

I understand that we do not want to sleep when updating process cpu
page table but note that only process that use the gpu would have to
sleep. So only process that can actually benefit from the using GPU
will suffer the consequences.

That said it also play a role with page reclamation hence why i am
proposing to have a separate lru for page involve with a GPU.

So having the hardware walking the cpu page table is out of the
question.

Cheers,
Jérôme

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>