Re: [RFC] Heterogeneous memory management (mirror process address space on a device mmu).

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 08, 2014 at 07:47:04PM +0300, sagi grimberg wrote:
> On 5/7/2014 5:33 AM, Davidlohr Bueso wrote:
> >On Tue, 2014-05-06 at 12:29 +0200, Peter Zijlstra wrote:
> >>So you forgot to CC Linus, Linus has expressed some dislike for
> >>preemptible mmu_notifiers in the recent past:
> >>
> >>   https://lkml.org/lkml/2013/9/30/385
> >I'm glad this came up again.
> >
> >So I've been running benchmarks (mostly aim7, which nicely exercises our
> >locks) comparing my recent v4 for rwsem optimistic spinning against
> >previous implementation ideas for the anon-vma lock, mostly:
> >
> >- rwsem (currently)
> >- rwlock_t
> >- qrwlock_t
> >- rwsem+optspin
> >
> >Of course, *any* change provides significant improvement in throughput
> >for several workloads, by avoiding to block -- there are more
> >performance numbers in the different patches. This is fairly obvious.
> >
> >What is perhaps not so obvious is that rwsem+optimistic spinning beats
> >all others, including the improved qrwlock from Waiman and Peter. This
> >is mostly because of the idea of cancelable MCS, which was mimic'ed from
> >mutexes. The delta in most cases is around +10-15%, which is non
> >trivial.
> 
> These are great news David!
> 
> >I mention this because from a performance PoV, we'll stop caring so much
> >about the type of lock we require in the notifier related code. So while
> >this is not conclusive, I'm not as opposed to keeping the locks blocking
> >as I once was. Now this might still imply things like poor design
> >choices, but that's neither here nor there.
> 
> So is the rwsem+opt strategy the way to go Given it keeps everyone happy?
> We will be more than satisfied with it as it will allow us to
> guarantee device
> MMU update.
> 
> >/me sees Sagi smiling ;)
> 
> :)

So i started doing thing with tlb flush but i must say things looks ugly.
I need a new page flag (goodbye 32bits platform) and i need my own lru and
page reclaimation for any page in use by a device, i need to hook up inside
try_to_unmap or migrate (but i will do the former). I am trying to be smart
by trying to schedule a worker on another cpu before before sending the ipi
so that while the ipi is in progress hopefully another cpu might schedule
the invalidation on the GPU and the wait after ipi for the gpu will be quick.

So all in all this is looking ugly and it does not change the fact that i
sleep (well need to be able to sleep). It just move the sleeping to another
part.

Maybe i should stress that with the mmu_notifier version it only sleep for
process that are using the GPU those process are using userspace API like
OpenCL which are not playing well with fork, ie read do not use fork if
you are using such API.

So for my case if a process has mm->hmm set to something that would mean
that there is a GPU using that address space and that it is unlikely to
go under the massive workload that people try to optimize the anon_vma
lock for.

My point is that with rwsem+optspin it could try spinning if mm->hmm
was NULL and make the massive fork workload go fast, or it could sleep
directly if mm->hmm is set.

This way my addition are not damaging anyone workload, only the workload
that would use hmm would likely have lock contention on fork but those
workload should not fork in the first place and if they do they should
pay a price.

I will finish up the tlb hackish version of hmm so people can judge how
ugly it is (in my view) and send it here as soon as i can.

But i think it's clear that with rwsem+optspin we can make all workload
happy and fast.

Cheers,
Jérôme Glisse

> 
> Sagi.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]