Re: btrfs regression since 4.X kernel NULL pointer dereference

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Sep 11, 2015 at 02:55:17PM -0400, Jeff Mahoney wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 8/25/15 5:00 AM, Christoph Hellwig wrote:
> > I think this is btrfs using a struct block_device that doesn't
> > have a valid queue pointer in it's gendisk for ->s_bdev.  And there
> > are some fishy looking ->s_bdev assignments in the code which I
> > suspect are related to it:
> > 
> > fs/btrfs/dev-replace.c: if (fs_info->sb->s_bdev ==
> > src_device->bdev) fs/btrfs/dev-replace.c:
> > fs_info->sb->s_bdev = tgt_device->bdev; fs/btrfs/volumes.c:     if
> > (device->bdev == root->fs_info->sb->s_bdev) fs/btrfs/volumes.c:
> > root->fs_info->sb->s_bdev = next_device->bdev; fs/btrfs/volumes.c:
> > if (tgtdev->bdev == fs_info->sb->s_bdev) fs/btrfs/volumes.c:
> > fs_info->sb->s_bdev = next_device->bdev;
> 
> The report at https://bugzilla.kernel.org/show_bug.cgi?id=100911
> tracks it down a bit further and it's bdev->bd_disk == NULL instead of
> the queue in the gendisk. I don't think that the s_bdev stuff is
> related, though I'd certainly love to see that bit go away.
> 
> If we're calling blk_get_backing_dev_info, that means we're already
> using an inode that has blockdev_superblock and the btrfs superblock
> isn't even involved.
> 
> We're getting there because btrfs_evict_inode ->
> btrfs_wait_ordered_range -> btrfs_fdatawrite_range ->
> filemap_fdatawrite_range gets called with inode->i_mapping.  That
> mapping gets passed down through __filemap_fdatawrite_range to
> wbc_attach_fdatawrite_inode where the inode passed is mapping->host --
> which will be the block device inode rather than the btrfs device node
> inode.  That inode is the one ultimately checked in inode_to_bdi.
> 
> So it looks like we're causing writeback on an unrelated block device
> that was opened using a device node hosted on btrfs, which is
> obviously wrong.
> 
> I don't think snapshot removal is even a requirement to trigger this.
>  I expect it's possible to trigger with two device nodes for the same
> block device where one is getting closed and cleaned up while the
> eviction of the other happens.  The device nodes wouldn't even need to
> be on the same fs.
> 
> Other file systems use &inode->i_data in eviction.  Is it that simple
> here?

Oh, ok I'm following now.  This really should explain it.  Jeff
mentioned that he's working on a patch to skip the wait_ordered_range
dance based on i_mode.  Thanks Jeff!

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux