On Wed, 27 Aug 2014 17:12:50 -0400 Matthew Wilcox <willy@xxxxxxxxxxxxxxx> wrote: > On Wed, Aug 27, 2014 at 01:06:13PM -0700, Andrew Morton wrote: > > On Tue, 26 Aug 2014 23:45:20 -0400 Matthew Wilcox <matthew.r.wilcox@xxxxxxxxx> wrote: > > > > > One of the primary uses for NV-DIMMs is to expose them as a block device > > > and use a filesystem to store files on the NV-DIMM. While that works, > > > it currently wastes memory and CPU time buffering the files in the page > > > cache. We have support in ext2 for bypassing the page cache, but it > > > has some races which are unfixable in the current design. This series > > > of patches rewrite the underlying support, and add support for direct > > > access to ext4. > > > > Sat down to read all this but I'm finding it rather unwieldy - it's > > just a great blob of code. Is there some overall > > what-it-does-and-how-it-does-it roadmap? > > The overall goal is to map persistent memory / NV-DIMMs directly to > userspace. We have that functionality in the XIP code, but the way > it's structured is unsuitable for filesystems like ext4 & XFS, and > it has some pretty ugly races. When thinking about looking at the patchset I wonder things like how does mmap work, in what situations does a page get COWed, how do we handle partial pages at EOF, etc. I guess that's all part of the filemap_xip legacy, the details of which I've totally forgotten. > Patches 1 & 3 are simply bug-fixes. They should go in regardless of > the merits of anything else in this series. > > Patch 2 changes the API for the direct_access block_device_operation so > it can report more than a single page at a time. As the series evolved, > this work also included moving support for partitioning into the VFS > where it belongs, handling various error cases in the VFS and so on. > > Patch 4 is an optimisation. It's poor form to make userspace take two > faults for the same dereference. > > Patch 5 gives us a VFS flag for the DAX property, which lets us get rid of > the get_xip_mem() method later on. > > Patch 6 is also prep work; Al Viro liked it enough that it's now in > his tree. > > The new DAX code is then dribbled in over patches 7-11, split up by > functional area. At each stage, the ext2-xip code is converted over to > the new DAX code. > > Patches 12-18 delete the remnants of the old XIP code, and fix the things > in ext2 that Jan didn't like when he reviewed them for ext4 :-) > > Patches 19 & 20 are the work to make ext4 use DAX. > > Patch 21 is some final cleanup of references to the old XIP code, renaming > it all to DAX. hrm. > > Some explanation of why one would use ext4 instead of, say, > > suitably-modified ramfs/tmpfs/rd/etc? > > ramfs and tmpfs really rely on the page cache. They're not exactly > built for permanence either. brd also relies on the page cache, and > there's a clear desire to use a filesystem instead of a block device > for all the usual reasons of access permissions, grow/shrink, etc. > > Some people might want to use XFS instead of ext4. We're starting with > ext4, but we've been keeping an eye on what other filesystems might want > to use. btrfs isn't going to use the DAX code, but some of the other > pieces will probably come in handy. > > There are also at least three people working on their own filesystems > specially designed for persistent memory. I wish them all the best > ... but I'd like to get this infrastructure into place. This is the sort of thing which first-timers (this one at least) like to see in [0/n]. > > Performance testing results? > > I haven't been running any performance tests. What sort of performance > tests would be interesting for you to see? fs benchmarks? `dd' would be a good start ;) I assume (because I wasn't told!) that there are two objectives here: 1) reduce memory consumption by not maintaining pagecache and 2) reduce CPU cost by avoiding the double-copies. These things are pretty easily quantified. And really they must be quantified as part of the developer testing, because if you find they've worsened then holy cow, what went wrong. > > Carsten Otte wrote filemap_xip.c and may be a useful reviewer of this > > work. > > I cc'd him on some earlier versions and didn't hear anything back. It felt > rude to keep plying him with 20+ patches every month. OK. > > All the patch subjects violate Documentation/SubmittingPatches > > section 15 ;) > > errr ... which bit? I used git format-patch to create them. None of the patch titles identify the subsystem(s) which they're hitting. eg, "Introduce IS_DAX(inode)" is an ext2 patch, but nobody would know that from browsing the titles. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html