Re: [Lsf-pc] [LSF/MM TOPIC] Persistent Memory

Mel Gorman <mgorman@xxxxxxx> · Wed, 8 Jan 2014 15:42:59 +0000

On Fri, Dec 20, 2013 at 10:05:02AM -0700, Matthew Wilcox wrote:
> 
> I should like to discuss the current situation with Linux support for
> persistent memory.  While I expect the current discussion to be long
> over by March, I am certain that there will be topics around persistent
> memory that have not been settled at that point.
> 
> I believe this will mostly be of crossover interest between filesystem
> and MM people, and of lesser interest to storage people (since we're
> basically avoiding their code).
> 
> Subtopics might include
>  - Using persistent memory for FS metadata
>    (The XIP code provides persistent memory to userspace.  The filesystem
>     still uses BIOs to fetch its metadata)
>  - Supporting PMD/PGD mappings for userspace
>    (Not only does the filesystem have to avoid fragmentation to make this
>     happen, the VM code has to permit these giant mappings)

The filesystem would also have to correctly align the data on disk. All
this implies that the underlying device is byte-addressible, similar access
speeds to RAM and directly accessible from userspace without the kernel
being involved. Without those conditions, I find it hard to believe that
TLB pressure dominates access cost. Then again I have no experience with
the devices or their intended use case so would not mind an education.

However, if you really wanted the device to be accessible like this then
the shortest solutions (and I want to punch myself for even suggesting
this) is to extend hugetlbfs to directly access these devices. It's
almost certainly a bad direction to take though, there would need to be a
good justification for it. Anything in this direction is pushing usage of
persistent devices to userspace and the kernel just provides an interface,
maybe that is desirable maybe not.

>  - Persistent page cache
>    (Another way to take advantage of persstent memory would be to place it
>     in the page cache.  But we don't have struct pages for it!  What to do?)

I don't the struct pages are really the problem here. Minimally you could
bodge it by creating a pgdat structure and allocating the struct pages for it
similar to how RAM is initialised. However, it completely sucks as a solution
because it causes all sorts of cache management problems, particularly page
aging inversion problems when treated as memory like this.  The resulting
API for userspace would hurt like like.  Think of NUMA problems, but much
much worse. Don't do this. The only reason I mention it is because so many
people seem to think it's a great solution at first glance.

Even considering the solution begs the question of "why". Sure, page cache
would be persistent across reboots but the information is readily available
on disk and if the data is read-mostly then who cares. If it's read/write,
making it persistent across a reboot will not improve overall performance. I
can see the need for some data to be persisted across a reboot (application
checkpoint, suspend/resume, crash data, something like bcache even if
sufficiently motivated) but none of that requires page cache support as such.

I'll throw my hands up and say that my lack of familiarity with the
expected use cases handicaps me.  We can twist the VM into all sorts of
circles but it'd be nice to know more about *why* we are doing something
before worrying about the how. Maybe I'm the only VM person that suffers
from this particular problem in which case I would appreciate being
pointed in a sensible direction some time before LSF/MM.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html