On 08/05/2015 06:01 PM, Dave Chinner wrote: > On Wed, Aug 05, 2015 at 04:19:08PM -0400, Jeff Moyer wrote: >> Hi, Matthew, >> >> Linda Knippers noticed that commit (bbab37ddc20b) breaks mkfs.xfs: >> >> # mkfs -t xfs -f /dev/pmem0 >> meta-data=/dev/pmem0 isize=256 agcount=4, agsize=524288 blks >> = sectsz=512 attr=2, projid32bit=1 >> = crc=0 finobt=0 >> data = bsize=4096 blocks=2097152, imaxpct=25 >> = sunit=0 swidth=0 blks >> naming =version 2 bsize=4096 ascii-ci=0 ftype=0 >> log =internal log bsize=4096 blocks=2560, version=2 >> = sectsz=512 sunit=0 blks, lazy-count=1 >> realtime =none extsz=4096 blocks=0, rtextents=0 >> mkfs.xfs: read failed: Numerical result out of range >> >> I sat down with Linda to look into it, and the problem is that mkfs.xfs >> sets the blocksize of the device to 512 (via BLKBSZSET), and then reads >> from the last sector of the device. This results in dax_io trying to do >> a page-sized I/O at 512 bytes from the end of the device. > > Right - we have to be able to do IO to that last sector, so this is > a sanity check to tell if the block dev is large enough. The XFS > kernel code does the same end-of-device sector read when the > filesystem is mounted, too. > >> bdev_direct_access, receiving this bogus pos/size combo, returns >> -ERANGE: >> >> if ((sector + DIV_ROUND_UP(size, 512)) > >> part_nr_sects_read(bdev->bd_part)) >> return -ERANGE; >> >> Given that file systems supporting dax refuse to mount with a blocksize >> != page size, I'm guessing this is sort of expected behavior. However, >> we really shouldn't be breaking direct I/O on pmem devices. > > If the device is advertising 512 byte sector size support, then this > needs to work, especially as DAX is completely transparent on the > block device. Remember that DAX through a filesystem works on > filesystem data block size boundaries, so a 512 byte sector/4k block > size filesystem will be able to use DAX for mmapped files just fine. > >> So, what do you want to do? We could make the pmem device's logical >> block size fixed at the sytem page size. Or, we could modify the dax >> code to work with blocksize < pagesize. Or, we could continue using the >> direct I/O codepath for direct block device access. What do you think? > > I don't know how the pmem device sets up it's limits. Can you post > the output of: > > /sys/block/pmem0/queue/logical_block_size 512 > /sys/block/pmem0/queue/physical_block_size 512 > /sys/block/pmem0/queue/hw_sector_size 512 > /sys/block/pmem0/queue/minimum_io_size 512 > /sys/block/pmem0/queue/optimal_io_size 0 Let me know if you need anything else. -- ljk > As these all affect how mkfs.xfs configures the filesystem being > made and so influences the size and alignment of the IO is does.... > > Cheers, > > Dave. > -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html