[LSF/MM TOPIC] bcachefs - status update, upstreaming (!?)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, I'd like to talk a bit about what I've sunk the past few years of my life
into :)

For those who haven't heard, bcachefs started out as an extended version of
bcache, and eventually grew into a full posix filesystem. It's a long weird
story.

Today, it's really a real filesystem with a small community of users and
testers, and the main focus has been on making it production quality and rock
solid - it's not a research project or a toy, it's meant to be used.

What's done:
 - pretty much all the normal posix fs functionality - xattrs, acls, fallocate,
   quotas.
 - fsck
 - full data checksumming
 - compression
 - encryption
 - multiple devivices (raid1 is done minus exposing a way to rereplicate
   degraded data after device failure)
 - caching (right now only writeback caching is exposed; a new more flexible
   interface is being worked on for caching and other allocation policy stuff)

What's _not_ done:
 - persistent allocation information; we still have to walk all our metadata on
   every mount to see what disk space is in use (and for a few other relatively
   minor reasons).

   This is less of an issue than you'd think: bcachefs walks metadata _really_
   fast, fast enough that nobody's complaining (even on multi terabyte
   filesystems; erasure coding is the most asked for feature, "faster mounts"
   never comes up). But of the remaining features to implement/things to deal
   with, this is going to be one of the most complex.

   One of the upsides though - because I've had to make walking metadata as fast
   as possible, bcachefs fsck is also really, really fast (it's run by default
   on every mount).

Planned features:
 - erasure coding (i.e. raid5/6)
 - snapshots

I also want to come up with a plan for eventually upstreaming this damned thing :)

One of the reasons I haven't even talked about upstreaming before is I _really_
haven't wanted to fix the on disk format before I was ready. This is still a
concern w.r.t. persistent allocation information and snapshots, but overall
there's been fewer and fewer reasons for on disk format changes; things seem to
be naturally stabilizing.

And I know there's going to be plenty of other people at LSF with recent
experience on upstreaming new filesystems, right now I don't have any strong
ideas of my own and welcome any input :)

Not sure what else I should talk about; I've been quiet for _way_ too long. I'd
welcome any questions or suggestions.

One other cool thing I've been doing lately is I finally rigged up some pure
btree performance/torture tests: I am _exceedingly_ proud of bcachefs's btree
(bcache's btree code is at best a prototype or a toy compared to bcachefs's).
The numbers are, I think, well worth showing off; I'd be curious if anyone knows
how other competing btree implementations (xfs's?) do in comparison:

These benchmarks are with 64 bit keys and 64 bit values: sequentially create,
iterate over, and delete 100M keys:

seq_insert:  100M with 1 threads in   104 sec,   998 nsec per iter,  978k per sec
seq_lookup:  100M with 1 threads in     1 sec,    10 nsec per iter, 90.8M per sec
seq_delete:  100M with 1 threads in    41 sec,   392 nsec per iter,  2.4M per sec

create 100M keys at random (64 bit random ints for the keys)

rand_insert: 100M with 1 threads in   227 sec,  2166 nsec per iter,  450k per sec
rand_insert: 100M with 6 threads in   106 sec,  6086 nsec per iter,  962k per sec

random lookups, over the 100M random keys we just created:

rand_lookup: 10M  with 1 threads in    10 sec,   995 nsec per iter,  981k per sec
rand_lookup: 10M  with 6 threads in     2 sec,  1223 nsec per iter,  4.6M per sec

mixed lookup/update: 75% lookup, 25% update:

rand_mixed:  10M  with 1 threads in    16 sec,  1615 nsec per iter,  604k per sec
rand_mixed:  10M  with 6 threads in     8 sec,  4614 nsec per iter,  1.2M per sec

This is on my ancient i7 gulftown, using a micron p320h (it's not a pure in
memory test, we're actually writing out those random inserts!). Numbers are
slightly better on my haswell :)



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux