Re: [PATCH v1 00/12] MEMORY_DEVICE_COHERENT for CPU-accessible coherent device memory

Jason Gunthorpe <jgg@xxxxxxxxxx> · Tue, 12 Oct 2021 15:56:29 -0300

On Tue, Oct 12, 2021 at 11:39:57AM -0700, Andrew Morton wrote:
> On Tue, 12 Oct 2021 12:12:35 -0500 Alex Sierra <alex.sierra@xxxxxxx> wrote:
> 
> > This patch series introduces MEMORY_DEVICE_COHERENT, a type of memory
> > owned by a device that can be mapped into CPU page tables like
> > MEMORY_DEVICE_GENERIC and can also be migrated like MEMORY_DEVICE_PRIVATE.
> > With MEMORY_DEVICE_COHERENT, we isolate the new memory type from other
> > subsystems as far as possible, though there are some small changes to
> > other subsystems such as filesystem DAX, to handle the new memory type
> > appropriately.
> > 
> > We use ZONE_DEVICE for this instead of NUMA so that the amdgpu
> > allocator can manage it without conflicting with core mm for non-unified
> > memory use cases.
> > 
> > How it works: The system BIOS advertises the GPU device memory (aka VRAM)
> > as SPM (special purpose memory) in the UEFI system address map.
> > The amdgpu driver registers the memory with devmap as
> > MEMORY_DEVICE_COHERENT using devm_memremap_pages.
> > 
> > The initial user for this hardware page migration capability will be
> > the Frontier supercomputer project.
> 
> To what other uses will this infrastructure be put?
> 
> Because I must ask: if this feature is for one single computer which
> presumably has a custom kernel, why add it to mainline Linux?

Well, it certainly isn't just "one single computer". Overall I know of
about, hmm, ~10 *datacenters* worth of installations that are using
similar technology underpinnings.

"Frontier" is the code name for a specific installation but as the
technology is proven out there will be many copies made of that same
approach.

The previous program "Summit" was done with NVIDIA GPUs and PowerPC
CPUs and also included a very similar capability. I think this is a
good sign that this coherently attached accelerator will continue to
be a theme in computing going foward. IIRC this was done using out of
tree kernel patches and NUMA localities.

Specifically with CXL now being standardized and on a path to ubiquity
I think we will see an explosion in deployments of coherently attached
accelerator memory. This is the high end trickling down to wider
usage.

I strongly think many CXL accelerators are going to want to manage
their on-accelerator memory in this way as it makes universal sense to
want to carefully manage memory access locality to optimize for
performance.

Jason