Re: direct_access, pinning and truncation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[... figuring out how g_u_p() references can prevent freeing and
re-using the underlying mapped pmem addresses given the lack of struct
pages for the mapping]

> I see three solutions here:
> 
> 1. If get_user_pages() is called, copy from PMEM into DRAM, and provide
> the caller with the struct pages of the DRAM.  Modify DAX to handle some
> file pages being in the page cache, and make sure that we know whether
> the PMEM or DRAM is up to date.  This has the obvious downside that
> get_user_pages() becomes slow.

And serialize transitions and fs stores to pmem regions.  And now
storing to dram-fronted pmem goes through all the dirtying and writeback
machinery.  This sounds like a nightmare to me, to be honest.

> 2. Modify filesystems that support DAX to handle pinning blocks.
> Some filesystems (that support COW and snapshots) already support
> reference-counting individual blocks.  We may be ale to do better by
> using a tree of pinned extents or something.  This makes it much harder
> to modify a filesystem to support DAX, and I don't see patches adding
> this capability to ext2 being warmly welcomed.

This seems.. doable?  Recording the referenced pmem in free lists in the
fs is fine as long as the pmem isn't modified until the references are
released, right?

Maybe in the allocator you skip otherwise free blocks if they intersect
with the run time structure (rbtree of extents, presumably) that is
taking the place of reference counts in struct page.  There aren't
*that* many allocator entry points.  I guess you'd need to avoid other
modifications of free space like trimming :/.  It still seems reasonably
doable?

And hey, lord knows we love to implement rbtrees of extents in file
systems!  (btrfs: struct extent_state, ext4: struct extent_status)

The tricky part would be maintaining that structure behind g_u_p() and
put_page() calls.  Probably a richer interface that gives callers
something more than just raw page pointers.

> 3. Make truncate() block if it hits a pinned page.  There's really no
> good reason to truncate a file that has pinned pages; it's either a bug
> or you're trying to be nasty to someone.  We actually already have code
> for this; inode_dio_wait() / inode_dio_done().  But page pinning isn't
> just for O_DIRECT I/Os and other transient users like crypto, it's also
> for long-lived things like RDMA, where we could potentially block for
> an indefinite time.

I have no concrete examples, but I agree that it sounds like the sort of
thing that would bite us in the ass if we miss some use case :/.

I guess my initial vote is for trying a less-than-perfect prototype of
#2 to see just how hairy the rough outline gets.

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux