Re: direct_access, pinning and truncation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Oct 10, 2014 at 03:08:05PM +0200, Jan Kara wrote:
> > One of the things on my todo list is making O_DIRECT work to a
> > memory-mapped direct_access file.  Right now, it simply doesn't work
> > because there's no struct page for the memory, so get_user_pages() fails.
> > Boaz has posted a patch to create struct pages for direct_access files,
> > which is certainly one way of solving the immediate problem, but it
> > ignores the deeper problem.
>   Maybe we can set some terminology - direct IO has two 'endpoints' (I
> don't want to talk about source / target because that swaps when talking
> about reads / writes). One endpoint is a 'buffer' and another endpoint is a
> 'storage'. Now 'buffer' may be a memory mapped file on some filesystem.
> In your case what isn't working is when 'buffer' is mmaped file on a DAX
> filesystem.

Good terminology :-)

> > 2. Modify filesystems that support DAX to handle pinning blocks.
> > Some filesystems (that support COW and snapshots) already support
> > reference-counting individual blocks.  We may be ale to do better by
> > using a tree of pinned extents or something.  This makes it much harder
> > to modify a filesystem to support DAX, and I don't see patches adding
> > this capability to ext2 being warmly welcomed.
> > 
> > 3. Make truncate() block if it hits a pinned page.  There's really no
> > good reason to truncate a file that has pinned pages; it's either a bug
> > or you're trying to be nasty to someone.  We actually already have code
> > for this; inode_dio_wait() / inode_dio_done().  But page pinning isn't
> > just for O_DIRECT I/Os and other transient users like crypto, it's also
> > for long-lived things like RDMA, where we could potentially block for
> > an indefinite time.
>   What option 3 seems to implicitely assume is that there are 'struct
> pages' to pin. So do you expect to add struct page to PFNs which were a
> target of get_user_pages()? And then check whether PFN is pinned (has
> corresponding struct page) in the truncate code?

I'm assuming that we come up with *some* way to solve the missing struct
page problem.  Whether it's restructuring splice, O_DIRECT and RDMA to do
without struct pages, whether it's dynamically allocating struct pages,
whether it's statically allocating struct pages, whether it's coming up
with some other data structure that takes the place of struct page for
DAX ... doesn't matter for this part of the conversation.

> Note that inode_dio_wait() isn't really what you look for. That waits for
> DIO pending against 'storage'. Currently we don't track in any way (except
> for elevated page reference counts) that 'buffer' is an endpoint of direct
> IO.

Ah, I wasn't clear ... I was proposing incrementing i_dio_count on the
buffer's inode when get_user_pages() was called.

> Thinking about options over and over again, I think trying something like
> 2) might be good. I'd still attach struct page to pinned PFNs to avoid some
> troubles but you could delay freeing of fs blocks if they are pinned by
> get_user_pages(). You could just hook into a path where filesystem frees
> blocks - e.g. ext4 already does this anyway in ext4_mb_free_metadata()
> since we free blocks in in-memory bitmaps only after the current
> transaction is committed (changes in in-memory bitmaps happen from
> ext4_journal_commit_callback(), which calls ext4_free_data_callback()). So
> ext4 already handles the situation where in-memory bitmaps are different
> from on disk ones and what you need is no different.

If this is something that (some) filesystems already do, then I feel
much happier about this idea!
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux