Re: [PATCH v10 00/21] Support ext4 on NV-DIMMs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 27 Aug 2014 17:12:50 -0400 Matthew Wilcox <willy@xxxxxxxxxxxxxxx> wrote:

> On Wed, Aug 27, 2014 at 01:06:13PM -0700, Andrew Morton wrote:
> > On Tue, 26 Aug 2014 23:45:20 -0400 Matthew Wilcox <matthew.r.wilcox@xxxxxxxxx> wrote:
> > 
> > > One of the primary uses for NV-DIMMs is to expose them as a block device
> > > and use a filesystem to store files on the NV-DIMM.  While that works,
> > > it currently wastes memory and CPU time buffering the files in the page
> > > cache.  We have support in ext2 for bypassing the page cache, but it
> > > has some races which are unfixable in the current design.  This series
> > > of patches rewrite the underlying support, and add support for direct
> > > access to ext4.
> > 
> > Sat down to read all this but I'm finding it rather unwieldy - it's
> > just a great blob of code.  Is there some overall
> > what-it-does-and-how-it-does-it roadmap?
> 
> The overall goal is to map persistent memory / NV-DIMMs directly to
> userspace.  We have that functionality in the XIP code, but the way
> it's structured is unsuitable for filesystems like ext4 & XFS, and
> it has some pretty ugly races.

When thinking about looking at the patchset I wonder things like how
does mmap work, in what situations does a page get COWed, how do we
handle partial pages at EOF, etc.  I guess that's all part of the
filemap_xip legacy, the details of which I've totally forgotten.

> Patches 1 & 3 are simply bug-fixes.  They should go in regardless of
> the merits of anything else in this series.
> 
> Patch 2 changes the API for the direct_access block_device_operation so
> it can report more than a single page at a time.  As the series evolved,
> this work also included moving support for partitioning into the VFS
> where it belongs, handling various error cases in the VFS and so on.
> 
> Patch 4 is an optimisation.  It's poor form to make userspace take two
> faults for the same dereference.
> 
> Patch 5 gives us a VFS flag for the DAX property, which lets us get rid of
> the get_xip_mem() method later on.
> 
> Patch 6 is also prep work; Al Viro liked it enough that it's now in
> his tree.
> 
> The new DAX code is then dribbled in over patches 7-11, split up by
> functional area.  At each stage, the ext2-xip code is converted over to
> the new DAX code.
> 
> Patches 12-18 delete the remnants of the old XIP code, and fix the things
> in ext2 that Jan didn't like when he reviewed them for ext4 :-)
> 
> Patches 19 & 20 are the work to make ext4 use DAX.
> 
> Patch 21 is some final cleanup of references to the old XIP code, renaming
> it all to DAX.

hrm.

> > Some explanation of why one would use ext4 instead of, say,
> > suitably-modified ramfs/tmpfs/rd/etc?
> 
> ramfs and tmpfs really rely on the page cache.  They're not exactly
> built for permanence either.  brd also relies on the page cache, and
> there's a clear desire to use a filesystem instead of a block device
> for all the usual reasons of access permissions, grow/shrink, etc.
> 
> Some people might want to use XFS instead of ext4.  We're starting with
> ext4, but we've been keeping an eye on what other filesystems might want
> to use.  btrfs isn't going to use the DAX code, but some of the other
> pieces will probably come in handy.
> 
> There are also at least three people working on their own filesystems
> specially designed for persistent memory.  I wish them all the best
> ... but I'd like to get this infrastructure into place.

This is the sort of thing which first-timers (this one at least) like
to see in [0/n].

> > Performance testing results?
> 
> I haven't been running any performance tests.  What sort of performance
> tests would be interesting for you to see?

fs benchmarks?  `dd' would be a good start ;)

I assume (because I wasn't told!) that there are two objectives here:

1) reduce memory consumption by not maintaining pagecache and
2) reduce CPU cost by avoiding the double-copies.

These things are pretty easily quantified.  And really they must be
quantified as part of the developer testing, because if you find
they've worsened then holy cow, what went wrong.

> > Carsten Otte wrote filemap_xip.c and may be a useful reviewer of this
> > work.
> 
> I cc'd him on some earlier versions and didn't hear anything back.  It felt
> rude to keep plying him with 20+ patches every month.

OK.

> > All the patch subjects violate Documentation/SubmittingPatches
> > section 15 ;)
> 
> errr ... which bit?  I used git format-patch to create them.

None of the patch titles identify the subsystem(s) which they're
hitting.  eg, "Introduce IS_DAX(inode)" is an ext2 patch, but nobody
would know that from browsing the titles.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux