Hello, I've recently got a bug report where JBD2 assertion failed due to transaction commit running out of journal space. After closer inspection of the crash dump it seems that the problem is that there were too many journal descriptor blocks (more that max_transaction_size >> 5 + 32 we estimate in jbd2_log_space_left()) due to descriptor blocks with revoke records. In fact the estimate on the number of descriptor blocks looks pretty arbitrary and there can be much more descriptor blocks needed for revoke records. We need one revoke record for every metadata block freed. So in the worst case (1k blocksize, 64-bit journal feature enabled, checksumming enabled) we fit 125 revoke record in one descriptor block. In common cases its about 500 revoke records per descriptor block. Now when we free large directories or large file with data journalling enabled, we can have *lots* of blocks to revoke - with extent mapped files easily millions in a single transaction which can mean 10k descriptor blocks - clearly more than the estimate of 128 descriptor blocks per transaction ;) Now users clearly don't hit this problem frequently so this is not common case but still it is possible and malicious user could use this to DoS the machine so I think we need to get even the weird corner-cases fixed. The question is how because as sketched above the worst case is too bad to account for in the common case. I have considered three options: 1) Count number of revoke records currently in the transaction and add needed revoke descriptor blocks to the expected transaction size. This is easy enough but does not solve all the corner cases - single handle can add lot of revoke blocks which may overflow the space we reserve for descriptor blocks. 2) Add argument to jbd2_journal_start() telling how many metadata blocks we are going to free and we would account necessary revoke descriptor blocks into reserved credits. This could work, we would generally need to pass inode->i_blocks / blocksize as the estimate of metadata blocks to free (for inodes to which this applies) as we don't have better estimate but I guess that's bearable. It would require some changes on ext4 side but not too intrusive. 3) Use the fact that we need to revoke only blocks that are currently in the journal. Thus the number of revoke records we really may need to store is well bound (by the journal size). What is a bit painful is tracking of which blocks are journalled. We could use a variant of counting Bloom filters to store that information with low memory consumption (say 64k of memory in common case) and high-enough accuracy but still that will be some work to write. On the plus side it would reduce the amount revoke records we have to store even in common case. Overall I'm probably leaning towards 2) but I'm happy to hear more opinions or ideas :) Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR