Re: [Lsf] [LSF/MM TOPIC] Beyond NUMA

James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> · Sun, 14 Apr 2013 18:52:39 -0700

On Mon, 2013-04-15 at 09:49 +1000, Dave Chinner wrote:
> > I've got to say from a physics, rather than mm perspective, this sounds
> > to be a really badly framed problem.  We seek to eliminate complexity by
> > simplification.  What this often means is that even though the theory
> > allows us to solve a problem in an arbitrary frame, there's usually a
> > nice one where it looks a lot simpler (that's what the whole game of
> > eigenvector mathematics and group characters is all about).
> > 
> > Saying we need to consider remote in-use memory as high numa and manage
> > it from a local node looks a lot like saying we need to consider a
> > problem in an arbitrary frame rather than looking for the simplest one.
> > The fact of the matter is that network remote memory has latency orders
> > of magnitude above local ... the effect is so distinct, it's not even
> > worth calling it NUMA.  It does seem then that the correct frame to
> > consider this in is local + remote separately with a hierarchical
> > management (the massive difference in latencies makes this a simple
> > observation from perturbation theory).  Amazingly this is what current
> > clustering tools tend to do, so I don't really see there's much here to
> > add to the current practice.
> 
> Everyone who wants to talk about this topic should google "vNUMA"
> and read the research papers from a few years ago. It gives pretty
> good insight in the practicality of treating the RAM in a cluster as
> a single virtual NUMA machine with a large distance factor.

Um yes, insert comment about crazy Australians.  vNUMA was doomed to
failure from the beginning, I think, because they tried to maintain
coherency across the systems.  The paper contains a nicely understated
expression of disappointment that the resulting system was so slow.

I'm sure, as an ex-SGI person, you'd agree with me that high numa across
network is possible ... but only with a boatload of hardware
acceleration like the altix had.

> And then there's the crazy guys that have been trying to implement
> DLM (distributed large memory) using kernel based MPI communication
> for cache coherency protocols at page fault level....

I have to confess to being one of those crazy people way back when I was
at bell labs in the 90s ... it was mostly a curiosity until it found a
use in distributed databases.

But the question still stands: The current vogue for clustering is
locally managed resources coupled to a resource hierarchy to try and get
away from the entanglement factors that cause the problems that vNUMA
saw ... what I don't get from this topic is what it will add to the
current state of the art or more truthfully what I get is it seems to be
advocating going backwards ...

James

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>