On Wed, 2015-11-18 at 16:09 +0100, Jan Kara wrote: > Hum, I don't get this. truncate_inode_pages_final() gets called when inode > has no more users. So there are no mappings of the inode. So how could > truncate_pagecache() possibly make a difference? True. I confirmed with more focus testing that the change to truncate_inode_pages_final() is not necessary. After invalidate_inodes() does unmap_mapping_range() we are protected by future calls to get_block() and blk_queue_enter() failing when there are attempts to re-establish a mapping after the block device has been torn down. Here's a revised patch. Note that the call truncate_pagecache() is replaced with a call to unmap_mapping_range() since it is fine to access zero pages that might still be in the page cache. 8<---- Subject: mm, dax: unmap dax mappings at bdev or fs shutdown From: Dan Williams <dan.j.williams@xxxxxxxxx> Currently dax mappings leak past / survive block_device shutdown. While page cache pages are permitted to be read/written after the block_device is torn down this is not acceptable in the dax case as all media access must end when the device is disabled. The pfn backing a dax mapping is permitted to be invalidated after bdev shutdown and this is indeed the case with brd. When a dax capable block_device driver calls del_gendisk() in its shutdown path del_gendisk() needs to ensure that all DAX pfns are unmapped. This is different than the pagecache backed case where the disk is protected by the queue being torn down which ends I/O to the device. Since dax bypasses the page cache we need to unconditionally unmap the inode. Cc: <stable@xxxxxxxxxxxxxxx> Cc: Jan Kara <jack@xxxxxxxx> Cc: Dave Chinner <david@xxxxxxxxxxxxx> Cc: Matthew Wilcox <willy@xxxxxxxxxxxxxxx> Cc: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx> [honza: drop changes to truncate_inode_pages_final] Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> --- fs/inode.c | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/fs/inode.c b/fs/inode.c index 1be5f9003eb3..dcb31d2c15e6 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -579,6 +579,18 @@ static void dispose_list(struct list_head *head) } } +static void unmap_list(struct list_head *head) +{ + struct inode *inode, *_i; + + list_for_each_entry_safe(inode, _i, head, i_lru) { + list_del_init(&inode->i_lru); + unmap_mapping_range(&inode->i_data, 0, 0, 1); + iput(inode); + cond_resched(); + } +} + /** * evict_inodes - evict all evictable inodes for a superblock * @sb: superblock to operate on @@ -642,6 +654,7 @@ int invalidate_inodes(struct super_block *sb, bool kill_dirty) int busy = 0; struct inode *inode, *next; LIST_HEAD(dispose); + LIST_HEAD(unmap); spin_lock(&sb->s_inode_list_lock); list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) { @@ -655,6 +668,19 @@ int invalidate_inodes(struct super_block *sb, bool kill_dirty) busy = 1; continue; } + if (IS_DAX(inode) && atomic_read(&inode->i_count)) { + /* + * dax mappings can't live past this invalidation event + * as there is no page cache present to allow the data + * to remain accessible. + */ + __iget(inode); + inode_lru_list_del(inode); + spin_unlock(&inode->i_lock); + list_add(&inode->i_lru, &unmap); + busy = 1; + continue; + } if (atomic_read(&inode->i_count)) { spin_unlock(&inode->i_lock); busy = 1; @@ -669,6 +695,7 @@ int invalidate_inodes(struct super_block *sb, bool kill_dirty) spin_unlock(&sb->s_inode_list_lock); dispose_list(&dispose); + unmap_list(&unmap); return busy; }��.n��������+%������w��{.n�����{���)��jg��������ݢj����G�������j:+v���w�m������w�������h�����٥