First, a short update on develpment: - Experimental should be coming off in ~6 months. This will mean stronger backwards compatibility guarantees (no more forced on disk format upgrades) and more backports. The main criteria for taking the experimental label off will be going a cycle without any critical bug reports, and making sure we have all the critical on disk format changes we want (i.e. things that'll be a hassle to fix later). - Major developments: - Next merge window pull request pushes practical scalability limits to ~50 PB. With the recent changes to address backpointers fsck, we fsck'd a filesystem with 10 PB of data in an hour and a half. - Online self healing is now the default, and more and more codepaths are being converted so that we can fix errors at runtime that would otherwise require fsck to correct. The goal is that users should never have to explicitly run fsck - even in extreme disaster scenarios - and we're pretty far along. - Online fsck still has a ways to go, but the locking issues for the main check_allocations pass are nearly sorted so we're past the hard technical hurdles. This is a high priority item. - Scrub will be landing in the next few weeks, or sooner. - Stabilization: "user is not able to access filesystem" bugs reports have essentially stopped; fsck is looking quite robust. Overall stability going by bug reports and user feedback is shaping up quickly. We do have reports of outstanding data corruption (nix package builders are providing excellent torture testing), and some severe performance bugs to address; these are the highest priority outstanding items. There are good things to report on the performance front: one tidbit is that users are reporting that in situations where btrfs falls over without nocow mode (databases, bittorrent) bcachefs does fine, and "bcachefs doesn't need" nocow is now the common advice to new users. Now that that's out of the way - my plan this year isn't to be talking about code, but rather development process - and the need to get organized. First, we need to talk about the historical track record with filesystem devolpment. Historically, most filesystem efforts have failed - and we ought to see what lessons we can learn. By my count, in the past few decades there have been 3 "next generation" filesystems that advanced the state of the art and actually delivered on their goals: - XFS - ZFS - and most recently, APFS Pointedly, none of these came from the open source community. While I would give ext3 half credit - it was a pragmatic approach that did deliver on its goals - the ext4 codebase also showcases some of the disadvantages of our "community based" approach. This isn't an open vs. closed source thing, Microsoft also failed with ntfs replacement; filesystems are hard. If there's a single overarching diagnosis to be made, I would say the issue is organizational: doing a filesystem right requires funding a team consistently for many years with the right kind of long term focus. The most successful efforts came from the big Unix vendors, when that was still a thing, and now from Apple, who is known for being able to organize and support engineering teams. All this is to say that I'd like for us to be able to set some long term priorities in the filesystem space, decide what we need to push for, and figure out how to get it done. The Linux kernel world is not poorly funded, but efforts don't get funded without a plan, and historically our filesysystem devolpment has suffered from a short term "project manager" type focus - a lot of effort being spent on individual highly niche features for customers with deep pockets, while bread and butter stuff gets neglected. ----------------- Here's my list of things we actually do need: Process, tooling: ----------------- - Firstly, a filesystem is not just the code itself. It's the tooling, the test infrastructure, the time spent working with users who are digging in and finding out what works and what doesn't: it is _whatever it takes to get the job done_. I've often heard talk from engineers who think of tooling as something "other people work on", or corporate types who don't want to work with users because "that's unpaid user support": but we don't get this done without a community, and that includes the _user_ community, and developers who aren't kernel developers. We need to be leveraging all the resources we have, and we need to be bringing the right attitude if we want to deliver the best work we can. Some specific things that I see still lacking, within the bcachefs world and the filesystem world as a whole: - Our testing automation still needs to be better. I've built developer focused testing automation, but it still needs work and I could use help. - We badly need automated performance testing. I still see people at the bigcorps doing performance testing manually, and what automated performance testing their is lives in the basements of certain engineers. This needs to be a standard thing we're all using. - Code coverage analysis needs to be a standard thing we're all looking at - it should be something we can trivially glance at when we're doing code review. (If anyone wants to help with this one, there's some trivial makefile work that needs to happen and my test infrastructure has the rest implemented). - bcachefs still needs real methodical automated error injection testing (I know XFS has this, props to them); we won't be able to consider fsck rock solid until this is done. Technical milestones: --------------------- bcachefs has achieved nearly all of the critical technical milestones I've laid out - or they're far enough along that we're past the "how well will this work" uncertainty. But here's my criteria for any major next gen filesystem: - Check/repair: Contrary to what certain people have voiced about "filesystems that don't need fsck", fsck will _always_ be a critical component of any filesystem - shit happens, and we need to be able to recover, and check if the system is in a consistent state (else many bugs will go undiscovered). Data loss is flatly unacceptable in any filesystem suitable for real usage: I do not care what happened or how a filesystem was corrupted, if there is data still present it is our job to recover it and get the system back to a working state. Additionally, fsck is _the_ scalability limitation as systems continue to grow. Inherently so, as there are many global invariants/references to be checked. As mentioned, bcachefs is now well into the petabyte range, which should be good for a bit - for most users. Long term, we're going to need allocation groups so that we can efficiently shard the main fsck passes; allocation groups will get bcachefs into the exabyte range. - Self healing, online fsck: Having the filesystem be offline for check/repair is also becoming a non-option, so anything that can be repaired at runtime must be - and we need to have mechanisms for getting the system up and running in RW mode and doing fsck online even when the filesystem is corrupt. (bcachefs has this covered, naturally). - Scalability Besides just the size of the dataset itself, large systems with large numbers of drives and complex topologies need to be supported. These systems exist, and today the methods of managing those large number of drives are lacking; we can do better. - Versioning, upgrade and downgrade flexibility and compatibility w.r.t. on disk format. A common complaint from users is being stuck on an old on disk format version, without even being aware, and being subject to bugs/issues which have been long since fixed. We need a better story for on disk format changes. bcachefs also has this one covered; while in the experimental phase we've been making extensive use of our ability to roll out new on disk format versions automatically and seamlessly _and still support downgrades_. - Real database fundamentals. Filesystems are databases, and if we steal as much as possible of the programming model from the database world it becomes drastically easier to roll out features and improvements; our code becomes more flexible and compatibility becomes much easier. What do people want, and how do we get organized? ------------------------------------------------- This part will be dependent on participation, naturally. It's all up to us, the engineers :) I'm hoping to get more community involvement from developers this year. I want to see this thing be as big a success as it deserves to be, and I want users to have something they can depend on - peace of mind. And I want this filesystem to be a place where people can bring their best ideas and see them to fruition. There's still interesting research to be done and interesting problems to be solved. Let's see what we can make happen...