On Wed, Nov 07, 2018 at 01:59:39PM -0600, Eric Sandeen wrote: > On 11/7/18 1:29 PM, Brian Foster wrote: > > On Wed, Nov 07, 2018 at 11:49:31AM -0600, Eric Sandeen wrote: > >> I'd like to propose an addition to our current metadata verifier error reporting that I believe will allow us to more quickly identify, and more efficiently classify, metadata validation errors when they occur. > ... > > >> 3) It would allow us to ensure that the hexdump actually contains the > >> region where the corruption was discovered. > >> > > > > Yep, though I still think the "first 64 bytes" or whatever of a > > particular buffer is useful to help establish sanity (i.e., is the magic > > sane? is the whole buffer zeroed out? etc.). > > Yeah I think we should always print the beginning as well. > > ... > > > > Did we consider some kind of error context structure in the past? I > > don't recall if there were unrelated issues with that, but that seems > > slightly more elegant than an xfs_buf field. Otherwise, the high level > > approach seems reasonable. > > Darrick made the same suggestion, and it's a good one thanks. > > ... > > >> We /could/ even go so far as to macrify tests like this, and do: > >> > >> XFS_VALIDATE_EQ(agf, agf_magicnum, XFS_AGF_MAGIC); > >> if (!XFS_AGF_GOOD_VERSION(be32_to_cpu(agf->agf_versionnum))) > >> XFS_VERIFIER_FAIL_RETURN(struct xfs_agf, agf_versionnum); > >> XFS_VALIDATE_EQ(agf, agf_versionnum, XFS_AGF_VERSION); > >> ... > >> > > > > This one makes my eyes cross a bit. ;) Though I think it's because now I > > have to wonder a bit about what each macro does. I wouldn't be against > > something like this if we could condense it into something more generic > > and straightforward. For example: > > > > XFS_VERIFY(agf->agf_magicnum == XFS_AGF_MAGIC, > > offsetof(struct xfs_agf, agf_magicnum)); > > XFS_VERIFY(XFS_AGF_GOOD_VERSION(be32_to_cpu(agf->agf_versionnum)), > > offsetof(struct xfs_agf, agf_versionnum)); > > ... > > > > I guess you'd need to stick the buffer or whatever stores the bad offset > > value in there somewhere as well, but seeing the explicit logic makes > > this easier to read than following it into a macro. Just my .02. > > hah, yeah, that's much better. Thanks. <nod> To reiterate a conversation we had on IRC: I tried the "one giant macro to test, record error state, and return" strategy for xfs_scrub and was shot down hiding control flow in something that looks like a "regular" function call. TBH I wouldn't necessarily be able to tell that XFS_VERIFY() actually *returns* from the function just by looking at the call sites. So what I propose instead is something sort of like what I did for scrub. In some header file you have: static inline xfs_failaddr_t xfs_buf_record_error( struct xfs_buf *bp, unsigned int offset, xfs_failaddr_t fa, int error) { bp->b_bad_offset = offset; bp->b_error = error; return fa; } and then in xfs_alloc.c you have: static xfs_failaddr_t xfs_agf_verify( struct xfs_buf *bp) { struct xfs_agf *agfp = XFS_BUF_TO_AGF(bp); #define XFS_AGF_CORRUPT(bp, field) \ xfs_buf_record_error((bp), offsetof(struct xfs_agf, (field)), \ __this_address, -EFSCORRUPTED) if (agf->agf_magicnum != cpu_to_be32(XFS_AGF_MAGIC)) return XFS_AGF_CORRUPT(bp, agf_magicnum); if (!XFS_AGF_GOOD_VERSION(be32_to_cpu(agf->agf_versionnum))) return XFS_AGF_CORRUPT(bp, agf_versionnum); ... #undef XFS_AGFL_ERROR } --D > -Eric