Ext3 and undeletion - A way how it could work.

Bodo Thiesen <bothie@xxxxxx> · Sun, 1 Feb 2004 07:00:58 +0100

Hello.

After reading some mails from mailing list archives and other sources, 
ext3 seems to having troubles in deleting files w/o destroying the data 
neccessary to undelete them. Ok, I know what journaling means, and in any 
way, the ext3 fs driver needs to keep track of which blocks have actually 
been marked as being free. But now my question: Who say, that this 
information must be stored in the original inode data? The answer is: Is 
isn't neccessary at all. Ok, how then?

My prescription:

1.1. We need a fixed inode, I'll refer to that as the "deletion inode" 
     here.
1.2. We need three blocks, whichs block numbers could be stored in the 
     super block. I'll refer to them as "deletion blocks" here.

Now, lets assume, a user enters rm foo.img where foo.img is a very big 
file. The steps, the ext3fs driver currently does is (IIRC, at least):

2.1. Link the inode to a special list of "inodes to delete". -- no change 
     here.
2.2. Check, if any inode is being delete in the moment. If not start it 
     now (go on with 3.1.). -- no change here.

Actual deletion: (That action initiated in step 2.2.)

Here, the procedure will completely changed:

3.1. Copy the inode (the 128 bytes, not the conten of the file ;-) to the 
     space for the "deletion inode", marking the original inode as being 
     deleted + set dtime. This is one transaction.

-> If the "deletion inode" has a dtime != 0 on mount-time after a crash, 
   recovery starts here.
3.2. Mark a bunch of blocks at the end of the file as free in the block 
     bitmaps. Check which indirect block, double indirect block and 
     trible indirect block needs to be updated, and copy them each in one 
     of the "deletion blocks", but only if they are not already in one of 
     the "deletion blocks". Update the copies in the "deletion blocks" 
     leaving the original ind, dind and tind blocks alone. All this is 
     done in one transaction.

3.3. Repeat 3.2. as long as there are still blocks beyond the 12 blocks 
     boundary.

3.4. Free the last blocks of the inode, and set dtime to zero (to prevent 
     later recoveries from staring over again). Again, one transaction.

3.5. Check if another inode is to be deleted. If yes, go on at 3.1.

A question which I would expect from you is, why it is enough to have 
three blocks and why it's not neccessary to zero out all ind, dind and 
tind blocks. For the answer, I want to summarize, how big files are 
stored.

4.1. The first 12 block numbers are stored in the inode itself. This is a 
     good way to speed up access to small files.
4.2. The next 256 (512; 1024) block numbers are stored in a separate 
     block, which's blocknumber will be stored as 13. block number in the 
     inode.
4.3. The next 256^2 (512^2; 1024^2) block numbers are stored in 256 (512; 
     1024) separate blocks, which's block numbers are collected together 
     in a double indirect block, which's block number is stored as 14. 
     block number in the inode.
4.4. For the next 256^3 (512^3; 1024^3) blocks the same happens, but only 
     with another nexting level.

Ok, but how do we truncate? We take e.g. the last 256 blocks, and collect 
their block numbers. Independantly of which block is the first and which 
is the last to free in this step, there is one boundary. All blocks 
before will not freed in this cycle, all blocks beyound it will be freed 
in this cycle. When removing the 256 blocks at the end of the file, at 
least one ind block needs to be updated. There are two cases in this 
stage:

5.1. If all 256 blocks are managed by the same ind block, the whole ind 
     block would being zeroed out. In this case, we don't need to touch 
     the ind block as we simply remove it's entry in the dind block.

5.2. So let us assume that we delete some last blocks of one ind block 
     and some first blocks of another ind block. (This is the typical 
     case when freeing 256 blocks. The last n are managed by the last ind 
     block and 256-n blocks are managed by the previous ind block). In 
     this szenario, the latter ind block would be completely zeroed out 
     and thus is unused now, so we don't take further attention to it. 
     Only the first ind block needs to be updated. This is, what we need 
     the first "deletion block" for. But wait. The other ind block (this 
     what would get zeroed out now) must be unlinked in it's maintaining 
     dind block. This is, where this case and case 1 meet again.

Now, the goal is to mark an ind block in a dind block as unused. (But 
this is the same procedure like marking 256 blocks as free, but only by 
removing one block instead of 256 blocks from the list).

Note: The worst case (3 "deletion blocks" needed) will even be the worst 
case when freeing the last allocated blocks of a file containing holes. 
(Thus even, if that means deleting 256 blocks spread over hundreds of GB 
in the inode's address space.)

I hope this was comprehensible, and if you still have questions, feel 
free to ask me.

Regards, Bodo
-- 
And you are sure I should enter "format c:"?
Yes, "format c:" will help.

_______________________________________________

Ext3-users@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/ext3-users