Re: e2dis: a Jigdo-like tool for Ext2+ FS

Ivan Shmakov <ivan@xxxxxxxxxxxxxxxx> · Mon, 15 Aug 2011 18:10:28 +0700



>>>>> Lukas Czerner <lczerner@xxxxxxxxxx> writes:
>>>>> On Sat, 13 Aug 2011, Ivan Shmakov wrote:

 >> A couple of weeks ago I've started working on a tool (tentantively
 >> named “Ext2 disassembler”) to walk through an Ext2+ filesystem (or
 >> an image of) and produce the mapping of files' (inodes') relative
 >> block numbers to the image's (or “physical”) block numbers.

 > I have not seen your code, but that sounds like something that
 > debugfs (part of e2fsprogs) is already doing very well (and a lot
 > more).  This is exactly the "extN disassembler" you're talking about

	Not quite.  The meaning of “disassembler” here is that the image
	is torn in parts, which could later be assembled together to
	form exactly the same image (by an “image assembler” tool.)

	It's not implied that e2dis will ever produce some sort of
	human-readable output (as its primary result.)  For that,
	debugfs(8) should indeed suffice.

 > and with a little bit of scripting around it you should be able dig
 > any information you desire from the file system so I do not think
 > that new application is needed.  But I might be wrong, just take a
 > look at it.

	Indeed, my first try was to use debugfs(8).  However, there're
	several issues with it:

	• I see no way to obtain the list of used inodes in debugfs(8)
	  (as of 1.41.12 debian 2); therefore, I have had to resort to
	  trying the ‘stat’ command on every possible inode number;

	• also, the (binary) filesystem data is serialized into ASCII by
	  debugfs(8) and is parsed afterwards by the invoking tool,
	  which is computationally-inefficient; (especially if applied
	  to a filesystem with size in the order of several GiB's, and
	  the number of used inodes in the order of tens of thousands,
	  or more);

	• moreover, I see no claims that the output of the debugfs(8)
	  ‘stat’ command won't ever change (neither I see the formal
	  description of the aforementioned output — its source is the
	  only form of specification I could rely); my guess is that the
	  C API, being documented, is going to be much more stable;

	That being said, the most of the code I've written so far is
	concerned /not/ with the filesystems per se (i. e., libext2fs
	calls), but with data recording: representing the data in a
	compact way, interfacing SQLite, etc.  (The SHA-1 computation
	and GNU-style CLI will require some coding as well, thus making
	the Ext2+ FS-specific parts even smaller when compared to the
	overall code size.)

[…]

-- 
FSF associate member #7257

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html