Re: [PATCH] docs: record the metadump file format

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 26 Jul 2017 09:25:36 +1000

On Tue, Jul 25, 2017 at 09:36:12AM -0700, Darrick J. Wong wrote:
> [cc xfs]
> 
> On Tue, Jul 25, 2017 at 06:20:54PM +0200, Luis R. Rodriguez wrote:
> > On Thu, Jun 15, 2017 at 03:00:02PM -0700, Darrick J. Wong wrote:
> > > Document the metadump file format.
> > 
> > Thanks for all this! I have started wondering all this and was
> > curious if there are perhaps more docs about the format or more
> > practical docs which can help one go read the dumps and help
> > analyze through examples.
> > 
> > > --- /dev/null
> > > +++ b/design/XFS_Filesystem_Structure/metadump.asciidoc
> > > +== Dump Obfuscation
> > > +
> > > +Unless explicitly disabled, the +xfs_metadump+ tool obfuscates empty block
> > > +space and naming information to avoid leaking sensitive information into
> > > +the metadump file.  +xfs_metadump+ does not copy user data blocks.
> > > +
> > > +The obfuscation policy is as follows:
> > > +
> > > +* File and extended attribute names are both considered "names".
> > > +* Names longer than 8 characters are totally rewritten with a name that matches the hash of the old name.
> > > +* Names between 5 and 8 characters are partially rewritten to match the hash of the old name.
> > 
> > Any reason for this?
> 
> /me doesn't know.  Maybe it's too hard to generate a new name with the
> same hash?
>
> > > +* Names shorter than 5 characters are not obscured at all.
> > 
> > This does not seem like a good idea, do we have a record of why this was done
> > historically?

It was done because of mathematics. This is all "IIRC" off the top
of my head....

The hash we calculate is 4 bytes long, so we can't calculate a hash
collision from less than 5 bytes of input. i.e. 4 bytes + 1 byte
to cause collision is the shortest we can obfuscate, but one byte of
"name overwrite" isn't enough to find collisions if all 4 of the
first 4 bytes are randomly chosen.

Hence for names 5-8 bytes in length we are limited to 1 byte of
correction for each of the first 4 bytes that is randomised. Hence
it's not until filenames are longer than 8 bytes that we can
generate a truly random filename that causes a hash collision.

> > > +* Names that cross a block boundary are not obscured at all.
> > 
> > Likewise.
> 
> iirc we basically copy things a block at a time, which makes it harder
> to deal with multi-fsblock dirblocks (???)

Discontiguous multi-block directories should be handled
transparently for xfs_db via libxfs buffers now. Maybe metadump
doesn't use these for dumping directory blocks? Also, We shouldn't
be splitting names and values across da block boundaries - we leave
free space in the block and allocate a new one if it doesn't fit....

Cheers,

Dave.

-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html