Just finished a big new update: the big allocator rewrite is finished and merged. It's a mandatory disk format upgrade; when switching to the new version on an existing filesystem you'll see it initialize the freespace btree when you mount. What's changed: we've got some new persistent data structures that replace code that used to periodically walk all the buckets in the filesystem, kept in an in memory array - and now that we don't need to do that anymore, the in-memory bucket array is gone, too. Specifically, we've got: - A new hash table for buckets awaiting journal commit before they can be reused, using cuckoo hashing (this one was rolled out awhile ago) - An extents-style freespace btree, to replace the code in the old allocator threads that periodically walked the arrays of buckets to build up freelists - A btree of buckets that need discarding before being moved to the freespace btree - A new LRU btree, for buckets containing cached data - replacing code in the allocator threads that would scan buckets and build up a heap of buckets to be reused. The old allocator threads are completely gone - and the code that replaces them all transactional b-tree code, much of it trigger based, that's _way_ easier to debug and reason about. This fixes weird performance corner cases and scalabiilty issues - in particular, the allocator threads were prone to using excessive CPU when the filesystem was nearly full. Also, we've got a new and much improved discard implementation! Previously, we'd only issue discards shortly prior to reusing/writing to a bucket again - now, we'll issue discards right after buckets become empty. Exciting stuff - this was the biggest and most invasive change in quite awhile, and I'm pretty happy with how it turned out. Next big change is going to be the addition of backpointers to fix copygc scanning, and a rebalance-work btree to fix rebalance thread scanning, and then we'll be pretty much set for major scalability work. Other recent changes/improvements: a lot of assorted debugability improvements. - list_journal improvements: now, when going emergency read only, we finish writing everything we have pending to the journal - we just mark them as noflush writes, so they'll never be used by recovery, but list_journal can still see them. This means when we detect an inconsistency, we can see all the updates leading up to it in the journal (along with what transactions were doing them), making it much easier to work backwards to what went wrong. We've been doing a lot of debugging lately with just list_journal and grep - yay for grep debugging! - A bunch of printbuf and to_text() method improvements, which make it easy to write good log messages when something goes wrong - Started moving some internal state used for debugging from sysfs to debugfs, where we can be much more verbose (yay for grep debugging!) - Fixed some snapshots bugs - figured out a major cause of the transaction path overflow bugs we've been seeing. And, big thanks to all the people who put up with and test my crappy code and help with finding all the bugs and beating it into shape :)