On Wed 23-03-16 11:52:58, Ross Zwisler wrote: > On Mon, Mar 21, 2016 at 02:22:50PM +0100, Jan Kara wrote: > > Currently we forbid page_cache_tree_insert() to replace exceptional radix > > tree entries for DAX inodes. However to make DAX faults race free we will > > lock radix tree entries and when hole is created, we need to replace > > such locked radix tree entry with a hole page. So modify > > page_cache_tree_insert() to allow that. > > Perhaps this is addressed later in the series, but at first glance this seems > unsafe to me - what happens of we had tasks waiting on the locked entry? Are > they woken up when we replace the locked entry with a page cache entry? > > If they are woken up, will they be alright with the fact that they were > sleeping on a lock for an exceptional entry, and now that lock no longer > exists? The radix tree entry will now be filled with a zero page hole, and so > they can't acquire the lock they were sleeping for - instead if they wanted to > lock that page they'd need to lock the zero page page lock, right? Correct. What you describes can happen and the DAX entry locking code counts with that. I guess I could comment on that in the changelog. Honza > > > Signed-off-by: Jan Kara <jack@xxxxxxx> > > --- > > include/linux/dax.h | 6 ++++++ > > mm/filemap.c | 18 +++++++++++------- > > 2 files changed, 17 insertions(+), 7 deletions(-) > > > > diff --git a/include/linux/dax.h b/include/linux/dax.h > > index 7c45ac7ea1d1..4b63923e1f8d 100644 > > --- a/include/linux/dax.h > > +++ b/include/linux/dax.h > > @@ -3,8 +3,14 @@ > > > > #include <linux/fs.h> > > #include <linux/mm.h> > > +#include <linux/radix-tree.h> > > #include <asm/pgtable.h> > > > > +/* > > + * Since exceptional entries do not use indirect bit, we reuse it as a lock bit > > + */ > > +#define DAX_ENTRY_LOCK RADIX_TREE_INDIRECT_PTR > > + > > ssize_t dax_do_io(struct kiocb *, struct inode *, struct iov_iter *, loff_t, > > get_block_t, dio_iodone_t, int flags); > > int dax_clear_sectors(struct block_device *bdev, sector_t _sector, long _size); > > diff --git a/mm/filemap.c b/mm/filemap.c > > index 7c00f105845e..fbebedaf719e 100644 > > --- a/mm/filemap.c > > +++ b/mm/filemap.c > > @@ -597,14 +597,18 @@ static int page_cache_tree_insert(struct address_space *mapping, > > if (!radix_tree_exceptional_entry(p)) > > return -EEXIST; > > > > - if (WARN_ON(dax_mapping(mapping))) > > - return -EINVAL; > > - > > - if (shadowp) > > - *shadowp = p; > > mapping->nrexceptional--; > > - if (node) > > - workingset_node_shadows_dec(node); > > + if (!dax_mapping(mapping)) { > > + if (shadowp) > > + *shadowp = p; > > + if (node) > > + workingset_node_shadows_dec(node); > > + } else { > > + /* DAX can replace empty locked entry with a hole */ > > + WARN_ON_ONCE(p != > > + (void *)(RADIX_TREE_EXCEPTIONAL_ENTRY | > > + DAX_ENTRY_LOCK)); > > + } > > } > > radix_tree_replace_slot(slot, page); > > mapping->nrpages++; > > -- > > 2.6.2 > > -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html