On Tue, Feb 2, 2016 at 3:36 PM, Jared Hulbert <jaredeh@xxxxxxxxx> wrote: > On Tue, Feb 2, 2016 at 3:19 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: >> >> On Tue, Feb 02, 2016 at 04:11:42PM -0700, Ross Zwisler wrote: >> >> > However, for raw block devices and for XFS with a real-time device, the >> > value in inode->i_sb->s_bdev is not correct. With the code as it is >> > currently written, an fsync or msync to a DAX enabled raw block device will >> > cause a NULL pointer dereference kernel BUG. For this to work correctly we >> > need to ask the block device or filesystem what struct block_device is >> > appropriate for our inode. >> > >> > To that end, add a get_bdev(struct inode *) entry point to struct >> > super_operations. If this function pointer is non-NULL, this notifies DAX >> > that it needs to use it to look up the correct block_device. If >> > i_sb->get_bdev() is NULL DAX will default to inode->i_sb->s_bdev. >> >> Umm... It assumes that bdev will stay pinned for as long as inode is >> referenced, presumably? If so, that needs to be documented (and verified >> for existing fs instances). In principle, multi-disk fs might want to >> support things like "silently move the inodes backed by that disk to other >> ones"... > > Dan, This is exactly the kind of thing I'm taking about WRT the > weirder device models and directly calling bdev_direct_access(). > Filesystems don't have the monogamous relationship with a device that > is implicitly assumed in DAX, you have to ask the filesystem what the > relationship is and is migrating to, and allow the filesystem to > update DAX when the relationship is changing. That's precisely what ->get_bdev() does. When the answer inode->i_sb->s_bdev lookup is invalid, use ->get_bdev(). > As we start to see many > DIMM's and 10s TiB pmem systems this is going be an even bigger deal > as load balancing, wear leveling, and fault tolerance concerned are > inevitably driven by the filesystem. No, there are no plans on the horizon for an fs to manage these media specific concerns for persistent memory. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html