On Fri, Oct 18, 2019 at 10:16 AM Darrick J. Wong <darrick.wong@xxxxxxxxxx> wrote: > > On Fri, Oct 18, 2019 at 09:10:34AM -0700, Dan Williams wrote: > > Hi, > > > > In the course of tracking down a v5.3 regression with filesystem-dax > > unable to generate huge page faults on any filesystem, I found that I > > can't generate huge faults on v5.2 with xfs, but ext4 works. That > > result indicates that the block device is properly physically aligned, > > but the allocator is generating misaligned extents. > > > > The test fallocates a 1GB file and then looks for a 2MB aligned > > extent. However, fiemap reports: > > > > for (i = 0; i < map->fm_mapped_extents; i++) { > > ext = &map->fm_extents[i]; > > fprintf(stderr, "[%ld]: l: %llx p: %llx len: %llx flags: %x\n", > > i, ext->fe_logical, ext->fe_physical, > > ext->fe_length, ext->fe_flags); > > } > > > > [0]: l: 0 p: 208000 len: 1fdf8000 flags: 800 > > [1]: l: 1fdf8000 p: c000 len: 170000 flags: 800 > > [2]: l: 1ff68000 p: 2000c000 len: 1ff70000 flags: 800 > > [3]: l: 3fed8000 p: 4000c000 len: 128000 flags: 801 > > > > ...where l == ->fe_logical and p == ->fe_physical. > > > > I'm still searching for the kernel where this behavior changed, but in > > the meantime wanted to report this in case its something > > straightforward in the allocator. The mkfs.xfs invocation in this case > > was: > > > > mkfs.xfs -f -d su=2m,sw=1 -m reflink=0 /dev/pmem0 > > As we talked about on irc while I waited for a slooow imap server, I > think this is caused by fallocate asking for a larger allocation than > the AG size. The allocator of course declines this, and bmap code is > too fast to drop the alignment hints. IIRC Brian and Carlos and Dave > were working on something in this area[1] but I don't think there's been > any progress in a month(?) > > Then Dan said agsize=131072, which means 512M AGs, so a 1G fallocate > will never generate an aligned allocation... but a 256M one seems to > work fine on my test vm. > Thanks Darrick. While reducing the fallocate causes physical alignment to happen some extents are still misaligned to the logical offset, but adding agcount=2 cleans it up for me.