Hello. After reading some mails from mailing list archives and other sources, ext3 seems to having troubles in deleting files w/o destroying the data neccessary to undelete them. Ok, I know what journaling means, and in any way, the ext3 fs driver needs to keep track of which blocks have actually been marked as being free. But now my question: Who say, that this information must be stored in the original inode data? The answer is: Is isn't neccessary at all. Ok, how then? My prescription: 1.1. We need a fixed inode, I'll refer to that as the "deletion inode" here. 1.2. We need three blocks, whichs block numbers could be stored in the super block. I'll refer to them as "deletion blocks" here. Now, lets assume, a user enters rm foo.img where foo.img is a very big file. The steps, the ext3fs driver currently does is (IIRC, at least): 2.1. Link the inode to a special list of "inodes to delete". -- no change here. 2.2. Check, if any inode is being delete in the moment. If not start it now (go on with 3.1.). -- no change here. Actual deletion: (That action initiated in step 2.2.) Here, the procedure will completely changed: 3.1. Copy the inode (the 128 bytes, not the conten of the file ;-) to the space for the "deletion inode", marking the original inode as being deleted + set dtime. This is one transaction. -> If the "deletion inode" has a dtime != 0 on mount-time after a crash, recovery starts here. 3.2. Mark a bunch of blocks at the end of the file as free in the block bitmaps. Check which indirect block, double indirect block and trible indirect block needs to be updated, and copy them each in one of the "deletion blocks", but only if they are not already in one of the "deletion blocks". Update the copies in the "deletion blocks" leaving the original ind, dind and tind blocks alone. All this is done in one transaction. 3.3. Repeat 3.2. as long as there are still blocks beyond the 12 blocks boundary. 3.4. Free the last blocks of the inode, and set dtime to zero (to prevent later recoveries from staring over again). Again, one transaction. 3.5. Check if another inode is to be deleted. If yes, go on at 3.1. A question which I would expect from you is, why it is enough to have three blocks and why it's not neccessary to zero out all ind, dind and tind blocks. For the answer, I want to summarize, how big files are stored. 4.1. The first 12 block numbers are stored in the inode itself. This is a good way to speed up access to small files. 4.2. The next 256 (512; 1024) block numbers are stored in a separate block, which's blocknumber will be stored as 13. block number in the inode. 4.3. The next 256^2 (512^2; 1024^2) block numbers are stored in 256 (512; 1024) separate blocks, which's block numbers are collected together in a double indirect block, which's block number is stored as 14. block number in the inode. 4.4. For the next 256^3 (512^3; 1024^3) blocks the same happens, but only with another nexting level. Ok, but how do we truncate? We take e.g. the last 256 blocks, and collect their block numbers. Independantly of which block is the first and which is the last to free in this step, there is one boundary. All blocks before will not freed in this cycle, all blocks beyound it will be freed in this cycle. When removing the 256 blocks at the end of the file, at least one ind block needs to be updated. There are two cases in this stage: 5.1. If all 256 blocks are managed by the same ind block, the whole ind block would being zeroed out. In this case, we don't need to touch the ind block as we simply remove it's entry in the dind block. 5.2. So let us assume that we delete some last blocks of one ind block and some first blocks of another ind block. (This is the typical case when freeing 256 blocks. The last n are managed by the last ind block and 256-n blocks are managed by the previous ind block). In this szenario, the latter ind block would be completely zeroed out and thus is unused now, so we don't take further attention to it. Only the first ind block needs to be updated. This is, what we need the first "deletion block" for. But wait. The other ind block (this what would get zeroed out now) must be unlinked in it's maintaining dind block. This is, where this case and case 1 meet again. Now, the goal is to mark an ind block in a dind block as unused. (But this is the same procedure like marking 256 blocks as free, but only by removing one block instead of 256 blocks from the list). Note: The worst case (3 "deletion blocks" needed) will even be the worst case when freeing the last allocated blocks of a file containing holes. (Thus even, if that means deleting 256 blocks spread over hundreds of GB in the inode's address space.) I hope this was comprehensible, and if you still have questions, feel free to ask me. Regards, Bodo -- And you are sure I should enter "format c:"? Yes, "format c:" will help. _______________________________________________ Ext3-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/ext3-users