Hi, I'd like to talk a bit about what I've sunk the past few years of my life into :) For those who haven't heard, bcachefs started out as an extended version of bcache, and eventually grew into a full posix filesystem. It's a long weird story. Today, it's really a real filesystem with a small community of users and testers, and the main focus has been on making it production quality and rock solid - it's not a research project or a toy, it's meant to be used. What's done: - pretty much all the normal posix fs functionality - xattrs, acls, fallocate, quotas. - fsck - full data checksumming - compression - encryption - multiple devivices (raid1 is done minus exposing a way to rereplicate degraded data after device failure) - caching (right now only writeback caching is exposed; a new more flexible interface is being worked on for caching and other allocation policy stuff) What's _not_ done: - persistent allocation information; we still have to walk all our metadata on every mount to see what disk space is in use (and for a few other relatively minor reasons). This is less of an issue than you'd think: bcachefs walks metadata _really_ fast, fast enough that nobody's complaining (even on multi terabyte filesystems; erasure coding is the most asked for feature, "faster mounts" never comes up). But of the remaining features to implement/things to deal with, this is going to be one of the most complex. One of the upsides though - because I've had to make walking metadata as fast as possible, bcachefs fsck is also really, really fast (it's run by default on every mount). Planned features: - erasure coding (i.e. raid5/6) - snapshots I also want to come up with a plan for eventually upstreaming this damned thing :) One of the reasons I haven't even talked about upstreaming before is I _really_ haven't wanted to fix the on disk format before I was ready. This is still a concern w.r.t. persistent allocation information and snapshots, but overall there's been fewer and fewer reasons for on disk format changes; things seem to be naturally stabilizing. And I know there's going to be plenty of other people at LSF with recent experience on upstreaming new filesystems, right now I don't have any strong ideas of my own and welcome any input :) Not sure what else I should talk about; I've been quiet for _way_ too long. I'd welcome any questions or suggestions. One other cool thing I've been doing lately is I finally rigged up some pure btree performance/torture tests: I am _exceedingly_ proud of bcachefs's btree (bcache's btree code is at best a prototype or a toy compared to bcachefs's). The numbers are, I think, well worth showing off; I'd be curious if anyone knows how other competing btree implementations (xfs's?) do in comparison: These benchmarks are with 64 bit keys and 64 bit values: sequentially create, iterate over, and delete 100M keys: seq_insert: 100M with 1 threads in 104 sec, 998 nsec per iter, 978k per sec seq_lookup: 100M with 1 threads in 1 sec, 10 nsec per iter, 90.8M per sec seq_delete: 100M with 1 threads in 41 sec, 392 nsec per iter, 2.4M per sec create 100M keys at random (64 bit random ints for the keys) rand_insert: 100M with 1 threads in 227 sec, 2166 nsec per iter, 450k per sec rand_insert: 100M with 6 threads in 106 sec, 6086 nsec per iter, 962k per sec random lookups, over the 100M random keys we just created: rand_lookup: 10M with 1 threads in 10 sec, 995 nsec per iter, 981k per sec rand_lookup: 10M with 6 threads in 2 sec, 1223 nsec per iter, 4.6M per sec mixed lookup/update: 75% lookup, 25% update: rand_mixed: 10M with 1 threads in 16 sec, 1615 nsec per iter, 604k per sec rand_mixed: 10M with 6 threads in 8 sec, 4614 nsec per iter, 1.2M per sec This is on my ancient i7 gulftown, using a micron p320h (it's not a pure in memory test, we're actually writing out those random inserts!). Numbers are slightly better on my haswell :)