On Thu, Apr 20, 2017 at 09:29:04AM -0400, Brian Foster wrote: > Directory block readahead uses a complex iteration mechanism to map > between high-level directory blocks and underlying physical extents. > This mechanism attempts to traverse the higher-level dir blocks in a > manner that handles multi-fsb directory blocks and simultaneously > maintains a reference to the corresponding physical blocks. > > This logic doesn't handle certain (discontiguous) physical extent > layouts correctly with multi-fsb directory blocks. For example, > consider the case of a 4k FSB filesystem with a 2 FSB (8k) directory > block size and a directory with the following extent layout: > > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL > 0: [0..7]: 88..95 0 (88..95) 8 > 1: [8..15]: 80..87 0 (80..87) 8 > 2: [16..39]: 168..191 0 (168..191) 24 > 3: [40..63]: 5242952..5242975 1 (72..95) 24 > > Directory block 0 spans physical extents 0 and 1, dirblk 1 lies > entirely within extent 2 and dirblk 2 spans extents 2 and 3. Because > extent 2 is larger than the directory block size, the readahead code > erroneously assumes the block is contiguous and issues a readahead > based on the physical mapping of the first fsb of the dirblk. This > results in read verifier failure and a spurious corruption or crc > failure, depending on the filesystem format. > > Further, the subsequent readahead code responsible for walking > through the physical table doesn't correctly advance the physical > block reference for dirblk 2. Instead of advancing two physical > filesystem blocks, the first iteration of the loop advances 1 block > (correctly), but the subsequent iteration advances 2 more physical > blocks because the next physical extent (extent 3, above) happens to > cover more than dirblk 2. At this point, the higher-level directory > block walking is completely off the rails of the actual physical > layout of the directory for the respective mapping table. > > Update the contiguous dirblock logic to consider the current offset > in the physical extent to avoid issuing directory readahead to > unrelated blocks. Also, update the mapping table advancing code to > consider the current offset within the current dirblock to avoid > advancing the mapping reference too far beyond the dirblock. > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > --- > > This so far survives a couple xfstests runs with default and 2xfsb > directory block sizes. I have another run in progress with max sized > directory blocks. This applies on top of Eric's original overrun fix. Looks ok, will test... Reviewed-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> --D > > Brian > > fs/xfs/xfs_dir2_readdir.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/fs/xfs/xfs_dir2_readdir.c b/fs/xfs/xfs_dir2_readdir.c > index c45de72..20b7a5c 100644 > --- a/fs/xfs/xfs_dir2_readdir.c > +++ b/fs/xfs/xfs_dir2_readdir.c > @@ -405,7 +405,8 @@ xfs_dir2_leaf_readbuf( > * Read-ahead a contiguous directory block. > */ > if (i > mip->ra_current && > - map[mip->ra_index].br_blockcount >= geo->fsbcount) { > + (map[mip->ra_index].br_blockcount - mip->ra_offset) >= > + geo->fsbcount) { > xfs_dir3_data_readahead(dp, > map[mip->ra_index].br_startoff + mip->ra_offset, > XFS_FSB_TO_DADDR(dp->i_mount, > @@ -438,7 +439,7 @@ xfs_dir2_leaf_readbuf( > * The rest of this extent but not more than a dir > * block. > */ > - length = min_t(int, geo->fsbcount, > + length = min_t(int, geo->fsbcount - j, > map[mip->ra_index].br_blockcount - > mip->ra_offset); > mip->ra_offset += length; > -- > 2.7.4 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html