On Wed, Apr 19, 2017 at 09:08:54AM -0700, Darrick J. Wong wrote: > On Wed, Apr 19, 2017 at 09:40:20AM -0500, Eric Sandeen wrote: > > On 4/19/17 7:32 AM, Brian Foster wrote: > > > On Tue, Apr 18, 2017 at 05:29:27PM -0700, Darrick J. Wong wrote: > > >> On Tue, Apr 18, 2017 at 07:14:03PM -0500, Eric Sandeen wrote: > > >>> On 4/18/17 7:09 PM, Darrick J. Wong wrote: > > >>>> On Tue, Apr 18, 2017 at 12:55:44PM -0400, Brian Foster wrote: > > >>>>> On Mon, Apr 17, 2017 at 05:12:36PM -0500, Eric Sandeen wrote: > > >>>>>> On 4/17/17 3:57 PM, Brian Foster wrote: > > >>>>>>> On Thu, Apr 13, 2017 at 01:45:43PM -0500, Eric Sandeen wrote: > > >>>>> ... > > >>>>>>> > > >>>>>>> This fix seems fine to me, but I'm wondering if this code may have > > >>>>>>> issues with other kinds of misalignment between the directory blocks and > > >>>>>>> underlying bmap extents as well. For example, what happens if we end up > > >>>>>>> with something like the following on an 8k dir fsb fs? > > >>>>>>> > > >>>>>>> 0:[0,xxx,3,0] > > >>>>>>> 1:[3,xxx,1,0] > > >>>>>>> > > >>>>>>> ... or ... > > >>>>>>> > > >>>>>>> 0:[0,xxx,3,0] > > >>>>>>> 1:[3,xxx,3,0] > > >>>>>> > > >>>>>> Well, as far as that goes it won't be an issue; for 8k dir block sizes > > >>>>>> we will allocate an extent map with room for 10 extents, and we'll go > > >>>>>> well beyond the above extents which cross directory block boundaries. > > >>>>>> > > >>>>>>> ... > > >>>>>>> N:[...] > > >>>>>>> > > >>>>>>> Am I following correctly that we may end up assuming the wrong mapping > > >>>>>>> for the second dir fsb and/or possibly skipping blocks? > > >>>>>> > > >>>>>> As far as I can tell, this code is only managing the read-ahead state > > >>>>>> by looking at these cached extents. We keep track of our position within > > >>>>>> that allocated array of mappings - this bug just stepped off the end > > >>>>>> while doing so. > > >>>>>> > > >>>>>> Stopping at the correct point should keep all of the state consistent > > >>>>>> and correct. > > >>>>>> > > >>>>>> But yeah, it's kind of hairy & hard to read, IMHO. > > >>>>>> > > >>>>>> Also as far as I can tell, we handle such discontiguities correctly, > > >>>>>> other than the bug I found. If you see something that looks suspicious, > > >>>>>> I'm sure I could tweak my test case to craft a specific situation if > > >>>>>> there's something you'd like to see tested... > > >>>>>> > > >>>>> > > >>>>> Background: Eric and I chatted a bit on irc to rectify that what I'm > > >>>>> calling out above is a different issue from what is fixed by this patch. > > >>>>> > > >>>>> Eric, > > >>>>> > > >>>>> I managed to construct a directory that looks like this: > > >>>>> > > >>>>> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL > > >>>>> 0: [0..7]: 88..95 0 (88..95) 8 > > >>>>> 1: [8..15]: 80..87 0 (80..87) 8 > > >>>>> 2: [16..39]: 168..191 0 (168..191) 24 > > >>>>> 3: [40..63]: 5242952..5242975 1 (72..95) 24 > > >>>>> > > >>>>> The fs has 8k directory fsbs. Dir fsb offset 0 spans extents 0 and 1, > > >>>>> offset 1 (which corresponds to the 512b range 16-31 above) is covered > > >>>>> completely by extent 2 and dir offset 2 (range 32-47) spans extents 2 > > >>>>> and 3. An ls of this directory produces this: > > >>>>> > > >>>>> XFS (dm-3): Metadata corruption detected at xfs_dir3_data_reada_verify+0x42/0x80 [xfs], xfs_dir3_data_reada block 0x500058 > > >>>>> XFS (dm-3): Unmount and run xfs_repair > > >>>>> XFS (dm-3): First 64 bytes of corrupted metadata buffer: > > >>>>> ffffbcb901c44000: 00 00 00 00 00 00 00 64 0f 78 78 78 78 78 78 78 .......d.xxxxxxx > > >>>>> ffffbcb901c44010: 78 78 78 78 2e 38 38 36 01 00 00 00 00 00 10 00 xxxx.886........ > > >>>>> ffffbcb901c44020: 00 00 00 00 00 00 00 64 0f 78 78 78 78 78 78 78 .......d.xxxxxxx > > >>>>> ffffbcb901c44030: 78 78 78 78 2e 38 38 37 01 00 00 00 00 00 10 20 xxxx.887....... > > >>>>> > > >>>>> ... which is yelling about block 184 (dir fsb 2). The fs is otherwise > > >>>>> clean according to xfs_repair. > > >>>>> > > >>>>> I _think_ something like the appended diff deals with it, but this is > > >>>>> lightly tested only and could definitely use more eyes. > > >>>>> > > >>>>> Brian > > >>>>> > > >>>>> --- 8< --- > > >>>>> > > >>>>> diff --git a/fs/xfs/xfs_dir2_readdir.c b/fs/xfs/xfs_dir2_readdir.c > > >>>>> index ad9396e..9fa379d 100644 > > >>>>> --- a/fs/xfs/xfs_dir2_readdir.c > > >>>>> +++ b/fs/xfs/xfs_dir2_readdir.c > > >>>>> @@ -404,7 +404,8 @@ xfs_dir2_leaf_readbuf( > > >>>>> * Read-ahead a contiguous directory block. > > >>>>> */ > > >>>>> if (i > mip->ra_current && > > >>>>> - map[mip->ra_index].br_blockcount >= geo->fsbcount) { > > >>>>> + (map[mip->ra_index].br_blockcount - mip->ra_offset) >= > > >>>>> + geo->fsbcount) { > > >>>>> xfs_dir3_data_readahead(dp, > > >>>>> map[mip->ra_index].br_startoff + mip->ra_offset, > > >>>>> XFS_FSB_TO_DADDR(dp->i_mount, > > >>>>> @@ -432,7 +433,7 @@ xfs_dir2_leaf_readbuf( > > >>>>> * The rest of this extent but not more than a dir > > >>>>> * block. > > >>>>> */ > > >>>>> - length = min_t(int, geo->fsbcount, > > >>>>> + length = min_t(int, geo->fsbcount - j, > > >>>> > > >>>> Looks ok to me to make Eric's bugfix complete. > > >>> > > >>> Brian, thanks for crafting the image to expose this. :) > > >>> > > >>> I've been otherwise occupied today, sorry - technically this fixes a separate > > >>> issue, yes? So 2 patches, 2 bugfixes AFAICT? > > >> > > >> Ok. I don't mind taking two patches since you're right, the first patch > > >> saves us from running off the end of the shadow bmap array, and the > > >> second one fixes bookkeepping problems when a dir extent starts midway > > >> through a large dirblock. > > >> > > > > > > Yeah, these are definitely separate issues and Eric's original patch > > > here is probably higher priority on its own. > > > > > > But I'm not clear on the plan here.. do we want this additional patch or > > > are we planning just to rewrite the readahead bits (or do we want to do > > > both)? I guess having a bugfix patch in the tree is useful from a stable > > > perspective... hm? > > > > > > Brian > > > > Yeah. > > > > There's 20 years of the old code in the field with these bugs - I think 2 > > straightforward fixes have value even if we might want to rewrite it in the > > future. > > Assuming 4.11 is released on Sunday, I was figuring 4.12 for the two > fixes + backporting to older kernels, and explore the rewrite for 4.13. > Ok, sounds good to me. I'll get this one posted once it survives a bit more testing. Thanks. Brian > --D > > > > > -Eric > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html