On Tue, Jul 25, 2017 at 09:36:12AM -0700, Darrick J. Wong wrote: > [cc xfs] > > On Tue, Jul 25, 2017 at 06:20:54PM +0200, Luis R. Rodriguez wrote: > > On Thu, Jun 15, 2017 at 03:00:02PM -0700, Darrick J. Wong wrote: > > > Document the metadump file format. > > > > Thanks for all this! I have started wondering all this and was > > curious if there are perhaps more docs about the format or more > > practical docs which can help one go read the dumps and help > > analyze through examples. > > > > > --- /dev/null > > > +++ b/design/XFS_Filesystem_Structure/metadump.asciidoc > > > +== Dump Obfuscation > > > + > > > +Unless explicitly disabled, the +xfs_metadump+ tool obfuscates empty block > > > +space and naming information to avoid leaking sensitive information into > > > +the metadump file. +xfs_metadump+ does not copy user data blocks. > > > + > > > +The obfuscation policy is as follows: > > > + > > > +* File and extended attribute names are both considered "names". > > > +* Names longer than 8 characters are totally rewritten with a name that matches the hash of the old name. > > > +* Names between 5 and 8 characters are partially rewritten to match the hash of the old name. > > > > Any reason for this? > > /me doesn't know. Maybe it's too hard to generate a new name with the > same hash? > > > > +* Names shorter than 5 characters are not obscured at all. > > > > This does not seem like a good idea, do we have a record of why this was done > > historically? It was done because of mathematics. This is all "IIRC" off the top of my head.... The hash we calculate is 4 bytes long, so we can't calculate a hash collision from less than 5 bytes of input. i.e. 4 bytes + 1 byte to cause collision is the shortest we can obfuscate, but one byte of "name overwrite" isn't enough to find collisions if all 4 of the first 4 bytes are randomly chosen. Hence for names 5-8 bytes in length we are limited to 1 byte of correction for each of the first 4 bytes that is randomised. Hence it's not until filenames are longer than 8 bytes that we can generate a truly random filename that causes a hash collision. > > > +* Names that cross a block boundary are not obscured at all. > > > > Likewise. > > iirc we basically copy things a block at a time, which makes it harder > to deal with multi-fsblock dirblocks (???) Discontiguous multi-block directories should be handled transparently for xfs_db via libxfs buffers now. Maybe metadump doesn't use these for dumping directory blocks? Also, We shouldn't be splitting names and values across da block boundaries - we leave free space in the block and allocate a new one if it doesn't fit.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html