On Fri, Oct 18, 2019 at 09:10:34AM -0700, Dan Williams wrote: > Hi, > > In the course of tracking down a v5.3 regression with filesystem-dax > unable to generate huge page faults on any filesystem, I found that I > can't generate huge faults on v5.2 with xfs, but ext4 works. That > result indicates that the block device is properly physically aligned, > but the allocator is generating misaligned extents. > > The test fallocates a 1GB file and then looks for a 2MB aligned > extent. However, fiemap reports: > > for (i = 0; i < map->fm_mapped_extents; i++) { > ext = &map->fm_extents[i]; > fprintf(stderr, "[%ld]: l: %llx p: %llx len: %llx flags: %x\n", > i, ext->fe_logical, ext->fe_physical, > ext->fe_length, ext->fe_flags); > } > > [0]: l: 0 p: 208000 len: 1fdf8000 flags: 800 > [1]: l: 1fdf8000 p: c000 len: 170000 flags: 800 > [2]: l: 1ff68000 p: 2000c000 len: 1ff70000 flags: 800 > [3]: l: 3fed8000 p: 4000c000 len: 128000 flags: 801 > > ...where l == ->fe_logical and p == ->fe_physical. > > I'm still searching for the kernel where this behavior changed, but in > the meantime wanted to report this in case its something > straightforward in the allocator. The mkfs.xfs invocation in this case > was: > > mkfs.xfs -f -d su=2m,sw=1 -m reflink=0 /dev/pmem0 As we talked about on irc while I waited for a slooow imap server, I think this is caused by fallocate asking for a larger allocation than the AG size. The allocator of course declines this, and bmap code is too fast to drop the alignment hints. IIRC Brian and Carlos and Dave were working on something in this area[1] but I don't think there's been any progress in a month(?) Then Dan said agsize=131072, which means 512M AGs, so a 1G fallocate will never generate an aligned allocation... but a 256M one seems to work fine on my test vm. --D [1] https://lore.kernel.org/linux-xfs/20190912143223.24194-1-bfoster@xxxxxxxxxx/