[SCSI/FS/MM] LSF10/MM topic proposal: PAM: Page-accessible memory

Dan Magenheimer <dan.magenheimer@xxxxxxxxxx> · Fri, 2 Jul 2010 16:17:05 -0700 (PDT)

Currently the kernel uses and manages two data storage
abstractions, RAM and device (i.e. disk).  RAM, as the name
implies, is randomly accessible at a byte level, very fast,
and as intended by Von Neumann, is interchangeably used by the
kernel for kernel code or kernel data or user code or user data.
Device storage is generally relatively slow, batched and
asynchronous, modeled to the kernel like rotating media.

In recent years, a number of new storage technologies --
both hardware- and software-based -- have appeared in the
middle between "true RAM" and "disk" including:

- hypervisor RAM
- compressed RAM
- SSDs
- phase change RAM
- far-far NUMA RAM
- (others?)

Each has unique performance and/or byte-accessibility and/or
reliability idiosyncrasies that hinder it from being treated
as "true RAM".  But each is also too fast and too expensive
to be treated as a "disk".  As a result, there have been
many attempts to shoehorn these odd memory types, along with
their idiosyncrasies, into various parts of the kernel
to serve various specific needs.  The result has not been
particularly aesthetic or maintainable.  Nor has this fractured
approach come close to achieving the new technologies'
full capabilities, thus pigeonholing their use and stunting
their potential growth.

To address (pun intended) these new "memory types", I propose
the addition of a new kernel memory abstraction which we will
call PAM, for page-accessible memory.  (Don't laugh at the
seeming audacity, overwhelming complexity, and low cost-benefit
of such an addition yet... please read on.)

As its name implies, PAM is accessed only by the page, not by
the byte (where pagesize must be specified but need not be 4K).
Like a device, data in PAM must be copied/DMA'ed into RAM for
the data to be directly used and/or byte-addressed by the kernel
or by userland.  Because many of the new memory types are dynamic
in nature, the kernel does not know a priori the size of PAM,
so the kernel addresses each page with a non-linear
object-oriented "handle" and accesses the data through a generic
synchronous API of get_page, put_page, and flush_page.  The
idiosyncrasies of each new memory type are then entirely hidden
in "PAM drivers" behind the API.

There are at least two types of PAM: ephemeral PAM (EPAM) and
persistent PAM (PPAM).  A put to EPAM is always successful, but
a get of the same page may fail; so EPAM is not guaranteed
to hold all of the pages put to it.  A put to PPAM may fail but,
once a put is successful, a get of the same page will always
be successful.  A PAM driver supporting EPAM and/or PPAM must
ensure certain additional coherency and concurrency semantics
that are beyond the scope of this brief discussion.  There
also may be other useful types of PAM.

There is existence proof that this kind of API has value.  The
proposed cleancache and frontswap patchsets demonstrate how
EPAM can be used as an "overflow" for page cache and how PPAM
can be used as a "fronting store" for swap devices; a shim
to Xen's Transcendent Memory ("tmem") demonstrates one PAM driver,
and Nitin Gupta's zcache work plans to demonstrate another.  Both
presume a synchronous API and that the pages of data put into
PAM are infrequently accessed by the kernel (and not at all
by userland); these semantics are critical to the cleancache and
frontswap patchsets but other semantics might possibly be
specified by flag parameters provided to a more generic PAM API.

While I firmly believe that cleancache/frontswap/tmem/zcache
can stand on their own merit and should be accepted into the
kernel,  I wonder if the generic PAM concept might serve nicely
as an API to other new RAM-like fast storage, such as SSDs and
phase change RAM.  I would like to discuss PAM concepts with
experts in these areas.  And I wonder if there might be other
kernel data storage needs obvious to kernel MM/FS experts
that might utilize such an API.  If so, maybe it is finally time
to free the kernel from the chains of Von Neumann and open
the kernel doors to other new types of RAM?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href