Here is my "analysis" of what happens in reiser4 during a transaction's lifetime wrt. block allocation and deallocation. THE EFFECTS (SEMANTICS) OF RELATED FUNCTIONS reiser4_alloc_blocks_bitmap(): allocates in WORKING BITMAP reiser4_dealloc_blocks_bitmap(!BA_DEFER): deallocates from WORKING BITMAP reiser4_dealloc_blocks_bitmap(BA_DEFER): stores to ->delete_set reiser4_pre_commit_hook_bitmap(): allocates all relocated nodes in COMMIT BITMAP deallocates ->delete_set from COMMIT BITMAP reiser4_post_commit_hook(): deallocates ->delete_set using !BA_DEFER (i. e. from WORKING BITMAP) TIMELINE OF ALLOCATIONS FOR "USUAL" NODES, AND TIMELINE OF TRANSACTION COMMIT - nodes are allocated using reiser4_alloc_blocks() and setting JNODE_RELOC, so WORKING BITMAP ensures that two nodes cannot get the same block; - nodes are deallocated using reiser4_dealloc_blocks(BA_DEFER), so their deallocation is not immediately reflected in WORKING BITMAP; (the relocate set is written here) - reiser4_pre_commit_hook_bitmap() uses 1) JNODE_RELOC flag and 2) ->delete_set to convey effective bitmap changes into COMMIT BITMAP; (the journal and overwrite set are written here) - reiser4_post_commit_hook() uses ->delete_set to convey deallocations from step 2 to WORKING BITMAP. (the discard happens here) TIMELINE OF ALLOCATIONS FOR WANDERED JOURNAL BLOCKS - at commit time, blocks are allocated using reiser4_alloc_blocks(), so they are allocated in WORKING BITMAP and do not interfere with any "usual" blocks; - after writing wandered blocks, they are deallocated using reiser4_dealloc_blocks(!BA_DEFER), i. e. from the WORKING BITMAP. CONCLUSION At possible transaction replay time, journal blocks are not allocated in any of the bitmaps. However, because the journal is read and replayed before a transaction has a chance to commit, this fact does not matter. What matters is that wandered journal blocks never hit COMMIT BITMAP. So, if I've got all this correct (which I highly doubt), the disk space leak (as you pointed it out) does not exist. What exists is a rather different problem with my idea of "log every deallocated block". Current implementation logs every block regardless of BA_DEFER flag presence or absence, so non-wandered blocks are logged twice. We could just use ->delete_set, but we would lose wandered blocks then. Or we could only log !BA_DEFER requests, which would do the right thing (wandered blocks + deallocations from reiser4_post_commit_hook()), but the reasoning behind this decision would not be obvious for a casual code reader. Or we could log only wandered blocks (in addition to ->delete_set) at discard time, but this is messy and requires us to merge the discard log with ->delete_set at discard time. Or we could log wandered blocks straight into ->delete_set and do something in reiser4_post_commit_hook() to separate these entries, but this is super messy. I'm preferring the second way... Edward, please proof-read all this. -- Ivan Shapovalov / intelfx /
Attachment:
signature.asc
Description: This is a digitally signed message part.