On Tue, Apr 18, 2017 at 05:29:27PM -0700, Darrick J. Wong wrote: > On Tue, Apr 18, 2017 at 07:14:03PM -0500, Eric Sandeen wrote: > > On 4/18/17 7:09 PM, Darrick J. Wong wrote: > > > On Tue, Apr 18, 2017 at 12:55:44PM -0400, Brian Foster wrote: > > >> On Mon, Apr 17, 2017 at 05:12:36PM -0500, Eric Sandeen wrote: > > >>> On 4/17/17 3:57 PM, Brian Foster wrote: > > >>>> On Thu, Apr 13, 2017 at 01:45:43PM -0500, Eric Sandeen wrote: > > >> ... > > >>>> > > >>>> This fix seems fine to me, but I'm wondering if this code may have > > >>>> issues with other kinds of misalignment between the directory blocks and > > >>>> underlying bmap extents as well. For example, what happens if we end up > > >>>> with something like the following on an 8k dir fsb fs? > > >>>> > > >>>> 0:[0,xxx,3,0] > > >>>> 1:[3,xxx,1,0] > > >>>> > > >>>> ... or ... > > >>>> > > >>>> 0:[0,xxx,3,0] > > >>>> 1:[3,xxx,3,0] > > >>> > > >>> Well, as far as that goes it won't be an issue; for 8k dir block sizes > > >>> we will allocate an extent map with room for 10 extents, and we'll go > > >>> well beyond the above extents which cross directory block boundaries. > > >>> > > >>>> ... > > >>>> N:[...] > > >>>> > > >>>> Am I following correctly that we may end up assuming the wrong mapping > > >>>> for the second dir fsb and/or possibly skipping blocks? > > >>> > > >>> As far as I can tell, this code is only managing the read-ahead state > > >>> by looking at these cached extents. We keep track of our position within > > >>> that allocated array of mappings - this bug just stepped off the end > > >>> while doing so. > > >>> > > >>> Stopping at the correct point should keep all of the state consistent > > >>> and correct. > > >>> > > >>> But yeah, it's kind of hairy & hard to read, IMHO. > > >>> > > >>> Also as far as I can tell, we handle such discontiguities correctly, > > >>> other than the bug I found. If you see something that looks suspicious, > > >>> I'm sure I could tweak my test case to craft a specific situation if > > >>> there's something you'd like to see tested... > > >>> > > >> > > >> Background: Eric and I chatted a bit on irc to rectify that what I'm > > >> calling out above is a different issue from what is fixed by this patch. > > >> > > >> Eric, > > >> > > >> I managed to construct a directory that looks like this: > > >> > > >> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL > > >> 0: [0..7]: 88..95 0 (88..95) 8 > > >> 1: [8..15]: 80..87 0 (80..87) 8 > > >> 2: [16..39]: 168..191 0 (168..191) 24 > > >> 3: [40..63]: 5242952..5242975 1 (72..95) 24 > > >> > > >> The fs has 8k directory fsbs. Dir fsb offset 0 spans extents 0 and 1, > > >> offset 1 (which corresponds to the 512b range 16-31 above) is covered > > >> completely by extent 2 and dir offset 2 (range 32-47) spans extents 2 > > >> and 3. An ls of this directory produces this: > > >> > > >> XFS (dm-3): Metadata corruption detected at xfs_dir3_data_reada_verify+0x42/0x80 [xfs], xfs_dir3_data_reada block 0x500058 > > >> XFS (dm-3): Unmount and run xfs_repair > > >> XFS (dm-3): First 64 bytes of corrupted metadata buffer: > > >> ffffbcb901c44000: 00 00 00 00 00 00 00 64 0f 78 78 78 78 78 78 78 .......d.xxxxxxx > > >> ffffbcb901c44010: 78 78 78 78 2e 38 38 36 01 00 00 00 00 00 10 00 xxxx.886........ > > >> ffffbcb901c44020: 00 00 00 00 00 00 00 64 0f 78 78 78 78 78 78 78 .......d.xxxxxxx > > >> ffffbcb901c44030: 78 78 78 78 2e 38 38 37 01 00 00 00 00 00 10 20 xxxx.887....... > > >> > > >> ... which is yelling about block 184 (dir fsb 2). The fs is otherwise > > >> clean according to xfs_repair. > > >> > > >> I _think_ something like the appended diff deals with it, but this is > > >> lightly tested only and could definitely use more eyes. > > >> > > >> Brian > > >> > > >> --- 8< --- > > >> > > >> diff --git a/fs/xfs/xfs_dir2_readdir.c b/fs/xfs/xfs_dir2_readdir.c > > >> index ad9396e..9fa379d 100644 > > >> --- a/fs/xfs/xfs_dir2_readdir.c > > >> +++ b/fs/xfs/xfs_dir2_readdir.c > > >> @@ -404,7 +404,8 @@ xfs_dir2_leaf_readbuf( > > >> * Read-ahead a contiguous directory block. > > >> */ > > >> if (i > mip->ra_current && > > >> - map[mip->ra_index].br_blockcount >= geo->fsbcount) { > > >> + (map[mip->ra_index].br_blockcount - mip->ra_offset) >= > > >> + geo->fsbcount) { > > >> xfs_dir3_data_readahead(dp, > > >> map[mip->ra_index].br_startoff + mip->ra_offset, > > >> XFS_FSB_TO_DADDR(dp->i_mount, > > >> @@ -432,7 +433,7 @@ xfs_dir2_leaf_readbuf( > > >> * The rest of this extent but not more than a dir > > >> * block. > > >> */ > > >> - length = min_t(int, geo->fsbcount, > > >> + length = min_t(int, geo->fsbcount - j, > > > > > > Looks ok to me to make Eric's bugfix complete. > > > > Brian, thanks for crafting the image to expose this. :) > > > > I've been otherwise occupied today, sorry - technically this fixes a separate > > issue, yes? So 2 patches, 2 bugfixes AFAICT? > > Ok. I don't mind taking two patches since you're right, the first patch > saves us from running off the end of the shadow bmap array, and the > second one fixes bookkeepping problems when a dir extent starts midway > through a large dirblock. > Yeah, these are definitely separate issues and Eric's original patch here is probably higher priority on its own. But I'm not clear on the plan here.. do we want this additional patch or are we planning just to rewrite the readahead bits (or do we want to do both)? I guess having a bugfix patch in the tree is useful from a stable perspective... hm? Brian > --D > > > > > Thanks, > > -Eric > > > > > I will, however, post a cleanup patch to remove the persistent shadow > > > bmap and have readahead issued directly off the inode fork contents. > > > > > > --D > > > > > >> map[mip->ra_index].br_blockcount - > > >> mip->ra_offset); > > >> mip->ra_offset += length; > > >> -- > > >> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > >> the body of a message to majordomo@xxxxxxxxxxxxxxx > > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html