Re: [PATCH 2/2] dax: fix bdev NULL pointer dereferences

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue 02-02-16 10:34:56, Ross Zwisler wrote:
> On Tue, Feb 02, 2016 at 09:10:24AM -0800, Dan Williams wrote:
> > On Tue, Feb 2, 2016 at 8:46 AM, Jan Kara <jack@xxxxxxx> wrote:
> > > On Tue 02-02-16 08:33:56, Dan Williams wrote:
> > >> On Tue, Feb 2, 2016 at 3:17 AM, Jan Kara <jack@xxxxxxx> wrote:
> > >> [..]
> > >> > I see, thanks for explanation. So I'm OK with changing what is stored in
> > >> > the radix tree to accommodate this use case but my reservation that we IHMO
> > >> > have other more pressing things to fix remains...
> > >>
> > >> We don't need pfns in the radix to support XFS RT configurations.
> > >> Just call get_blocks() again and use the sector, or am I missing
> > >> something?
> > >
> > > You are correct. But if you decide to pay the cost of additional
> > > get_block() call, you only need the dirty tag in the radix tree and nothing
> > > else. So my understanding was that the whole point of games with radix tree
> > > is avoiding this extra get_block() calls for fsync().
> > >
> > 
> > DAX-fsync() is already a potentially expensive operation to cover data
> > durability guarantees for DAX-unaware applications.  A DAX-aware
> > application is going to skip fsync, and the get_blocks() cost, to do
> > cache management itself.
> > 
> > Willy pointed out some other potential benefits, assuming a suitable
> > replacement for the protections afforded by the block-device
> > percpu_ref counter can be found.  However, optimizing for the
> > DAX-unaware-application case seems the wrong motivation to me.
> 
> Oh, no, the primary issue with calling get_block() in the fsync path isn't
> performance.  It's that we don't have any idea what get_block() function to
> call.
> 
> The fault handler calls all come from the filesystem directly, so they can
> easily give us an appropriate get_block() function pointer.  But the
> dax_writeback_mapping_range() calls come from the generic code in
> mm/filemap.c, and don't know what get_block() to pass in.
> 
> During one iteration I had the calls to dax_writeback_mapping_range()
> happening in the filesystem fsync code so that it could pass in get_block(),
> but Dave Chinner pointed out that this misses other paths in the filesystem
> that need to have things flushed via a call to filemap_write_and_wait_range().

Let's clear this up a bit: The problem with using ->fsync() method is that
it doesn't get called for sync(2). We could use ->sync_fs() to flush caches
in case of sync(2) (that's what's happening for normal storage) but the
problem with PMEM is that "flush all cached data" operation effectively
means iterate through all modified pages and we didn't want to implement
this for DAX fsync code.

So we have decided to do cache flushing for DAX at a different point - mark
inodes which may have writes cached as dirty and use writeback code for the
cache flushing. But looking at it now, we have actually chosen a wrong
place to do the flushing in the writeback path - note that sync(2) writes
data via __writeback_single_inode() -> do_writepages() and thus doesn't
even get to filemap_write_and_wait().

So revisiting the decision I see two options:

1) Move the DAX flushing code from filemap_write_and_wait() into
->writepages() fs callback. There the filesystem can provide all the
information it needs including bdev, get_block callback, or whatever.

2) Back out even further and implement own tracking and iteration of inodes
to write.

So far I still think 2) is not worth the complexity (although it would
bring DAX code closer to how things behave with standard storage) so I
would go for 1).

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux