On Fri, Nov 08, 2024 at 11:33:58AM +0100, Jan Kara wrote: > > 1048576 records - 95 seconds > > 2097152 records - 580 seconds > > These are really high numbers of revoke records. Deleting couple GB of > metadata doesn't happen so easily. Are they from a real workload or just > a stress test? For context, the background of this is that this has been an out-of-tree that's been around for a very long time, for use with Lustre servers where apparently, this very large number of revoke records is a real thing. > If my interpretation is correct, then rhashtable is unnecessarily > huge hammer for this. Firstly, as the big hash is needed only during > replay, there's no concurrent access to the data > structure. Secondly, we just fill the data structure in the > PASS_REVOKE scan and then use it. Thirdly, we know the number of > elements we need to store in the table in advance (well, currently > we don't but it's trivial to modify PASS_SCAN to get that number). > > So rather than playing with rhashtable, I'd modify PASS_SCAN to sum > up number of revoke records we're going to process and then prepare > a static hash of appropriate size for replay (we can just use the > standard hashing fs/jbd2/revoke.c uses, just with differently sized > hash table allocated for replay and point journal->j_revoke to > it). And once recovery completes jbd2_journal_clear_revoke() can > free the table and point journal->j_revoke back to the original > table. What do you think? Hmm, that's a really nice idea; Andreas, what do you think? - Ted