Hi Linus, here's the main bcachefs updates for 6.8. Cheers, Kent The following changes since commit 0d72ab35a925d66b044cb62b709e53141c3f0143: bcachefs: make RO snapshots actually RO (2024-01-01 11:47:07 -0500) are available in the Git repository at: https://evilpiepirate.org/git/bcachefs.git tags/bcachefs-2024-01-10 for you to fetch changes up to 169de41985f53320580f3d347534966ea83343ca: bcachefs: eytzinger0_find() search should be const (2024-01-05 23:24:46 -0500) ---------------------------------------------------------------- bcachefs updates for 6.8: - btree write buffer rewrite: instead of adding keys to the btree write buffer at transaction commit time, we know journal them with a different journal entry type and copy them from the journal to the write buffer just prior to journal write. This reduces the number of atomic operations on shared cachelines in the transaction commit path and is a signicant performance improvement on some workloads: multithreaded 4k random writes went from ~650k iops to ~850k iops. - Bring back optimistic spinning for six locks: the new implementation doesn't use osq locks; instead we add to the lock waitlist as normal, and then spin on the lock_acquired bit in the waitlist entry, _not_ the lock itself. - BCH_IOCTL_DEV_USAGE_V2, which allows for new data types - BCH_IOCTL_OFFLINE_FSCK, which runs the kernel implementation of fsck but without mounting: useful for transparently using the kernel version of fsck from 'bcachefs fsck' when the kernel version is a better match for the on disk filesystem. - BCH_IOCTL_ONLINE_FSCK: online fsck. Not all passes are supported yet, but the passes that are supported are fully featured - errors may be corrected as normal. The new ioctls use the new 'thread_with_file' abstraction for kicking off a kthread that's tied to a file descriptor returned to userspace via the ioctl. - btree_paths within a btree_trans are now dynamically growable, instead of being limited to 64. This is important for the check_directory_structure phase of fsck, and also fixes some issues we were having with btree path overflow in the reflink btree. - Trigger refactoring; prep work for the upcoming disk space accounting rewrite - Numerous bugfixes :) ---------------------------------------------------------------- Brian Foster (3): bcachefs: remove sb lock and flags update on explicit shutdown bcachefs: return from fsync on writeback error to avoid early shutdown bcachefs: clean up some dead fallocate code Daniel Hill (6): bcachefs: add a quieter bch2_read_super bcachefs: remove dead bch2_evacuate_bucket() bcachefs: rebalance should wakeup on shutdown if disabled bcachefs: copygc should wakeup on shutdown if disabled bcachefs: copygc shouldn't try moving buckets on error bcachefs: remove redundant condition from data_update_index_update Gustavo A. R. Silva (3): bcachefs: Replace zero-length arrays with flexible-array members bcachefs: Use array_size() in call to copy_from_user() bcachefs: Replace zero-length array with flex-array member and use __counted_by Kent Overstreet (210): bcachefs: Flush fsck errors before running twice bcachefs: Add extra verbose logging for ro path bcachefs: Improved backpointer messages in fsck bcachefs: kill INODE_LOCK, use lock_two_nondirectories() bcachefs: Check for unlinked inodes not on deleted list bcachefs: Fix locking when checking freespace btree bcachefs: Print old version when scanning for old metadata bcachefs: Fix warning when building in userspace bcachefs: Include average write size in sysfs journal_debug bcachefs: Add an assertion in bch2_journal_pin_set() bcachefs: Journal pins must always have a flush_fn bcachefs: track_event_change() bcachefs: Clear k->needs_whitout earlier in commit path bcachefs: BTREE_INSERT_JOURNAL_REPLAY now "don't init trans->journal_res" bcachefs: Kill BTREE_UPDATE_PREJOURNAL bcachefs: Go rw before journal replay bcachefs: Make journal replay more efficient bcachefs: Avoiding dropping/retaking write locks in bch2_btree_write_buffer_flush_one() bcachefs: Fix redundant variable initialization bcachefs: Kill dead BTREE_INSERT flags bcachefs: bch_str_hash_flags_t bcachefs: Rename BTREE_INSERT flags bcachefs: Improve btree_path_dowgrade tracepoint bcachefs: backpointers fsck no longer uses BTREE_ITER_ALL_LEVELS bcachefs: Kill BTREE_ITER_ALL_LEVELS bcachefs: Fix userspace bch2_prt_datetime() bcachefs: Don't rejournal keys in key cache flush bcachefs: Don't flush journal after replay bcachefs: Add a tracepoint for journal entry close bcachefs: Kill memset() in bch2_btree_iter_init() bcachefs: Kill btree_iter->journal_pos bcachefs: Rename bch_replicas_entry -> bch_replicas_entry_v1 bcachefs: Don't use update_cached_sectors() in bch2_mark_alloc() bcachefs: x-macro-ify bch_data_ops enum bcachefs: Convert bch2_move_btree() to bbpos bcachefs: BCH_DATA_OP_drop_extra_replicas powerpc: Export kvm_guest static key, for bcachefs six locks bcachefs: six locks: Simplify optimistic spinning bcachefs: Simplify check_bucket_ref() bcachefs: BCH_IOCTL_DEV_USAGE_V2 bcachefs: New bucket sector count helpers bcachefs: bch2_dev_usage_to_text() bcachefs: Kill dev_usage->buckets_ec bcachefs: Improve sysfs compression_stats bcachefs: Print durability in member_to_text() bcachefs: Add a rebalance, data_update tracepoints bcachefs: Refactor bch2_check_alloc_to_lru_ref() bcachefs: Kill journal_seq/gc args to bch2_dev_usage_update_m() bcachefs: convert bch_fs_flags to x-macro bcachefs: No need to allocate keys for write buffer bcachefs: Improve btree write buffer tracepoints bcachefs: kill journal->preres_wait bcachefs: delete useless commit_do() bcachefs: Clean up btree write buffer write ref handling bcachefs: bch2_btree_write_buffer_flush_locked() bcachefs: bch2_btree_write_buffer_flush() -> bch2_btree_write_buffer_tryflush() bcachefs: count_event() bcachefs: Improve trace_trans_restart_too_many_iters() bcachefs: Improve trace_trans_restart_would_deadlock bcachefs: Don't open code bch2_dev_exists2() bcachefs: ONLY_SPECIFIED_DEVS doesn't mean ignore durability anymore bcachefs: wb_flush_one_slowpath() bcachefs: more write buffer refactoring bcachefs: Explicity go RW for fsck bcachefs: On missing backpointer to interior node, flush interior updates bcachefs: Make backpointer fsck wb flush check more rigorous bcachefs: Include btree_trans in more tracepoints bcachefs: Move reflink_p triggers into reflink.c bcachefs: Refactor trans->paths_allocated to be standard bitmap bcachefs: BCH_ERR_opt_parse_error bcachefs: Improve error message when finding wrong btree node bcachefs: c->ro_ref bcachefs: thread_with_file bcachefs: Add ability to redirect log output bcachefs: Mark recovery passses that are safe to run online bcachefs: bch2_run_online_recovery_passes() bcachefs: BCH_IOCTL_FSCK_OFFLINE bcachefs: BCH_IOCTL_FSCK_ONLINE bcachefs: Fix open coded set_btree_iter_dontneed() bcachefs: Fix bch2_read_btree() bcachefs: continue now works in for_each_btree_key2() bcachefs: Kill for_each_btree_key() bcachefs: Rename for_each_btree_key2() -> for_each_btree_key() bcachefs: reserve path idx 0 for sentinal bcachefs: Fix snapshot.c assertion for online fsck bcachefs: kill btree_path->(alloc_seq|downgrade_seq) bcachefs; kill bch2_btree_key_cache_flush() bcachefs: Improve trans->extra_journal_entries bcachefs: bch2_trans_node_add no longer uses trans_for_each_path() bcachefs: Unwritten journal buffers are always dirty bcachefs: journal->buf_lock bcachefs: btree write buffer now slurps keys from journal bcachefs: Inline btree write buffer sort bcachefs: check_root() can now be run online bcachefs: kill btree_trans->wb_updates bcachefs: Drop journal entry compaction bcachefs: fix userspace build errors bcachefs: bch_err_(fn|msg) check if should print bcachefs: qstr_eq() bcachefs: drop extra semicolon bcachefs: Make sure allocation failure errors are logged MAINTAINERS: Update my email address bcachefs: Delete dio read alignment check bcachefs: Fixes for rust bindgen bcachefs: check for failure to downgrade bcachefs: Use GFP_KERNEL for promote allocations bcachefs: Improve the nopromote tracepoint bcachefs: trans_for_each_update() now declares loop iter bcachefs: darray_for_each() now declares loop iter bcachefs: simplify bch_devs_list bcachefs: better error message in btree_node_write_work() bcachefs: add more verbose logging bcachefs: fix warning about uninitialized time_stats bcachefs: use track_event_change() for allocator blocked stats bcachefs: bch2_trans_srcu_lock() should be static bcachefs: bch2_dirent_lookup() -> lockrestart_do() bcachefs: for_each_btree_key_upto() -> for_each_btree_key_old_upto() bcachefs: kill for_each_btree_key_old_upto() bcachefs: kill for_each_btree_key_norestart() bcachefs: for_each_btree_key() now declares loop iter bcachefs: for_each_member_device() now declares loop iter bcachefs: for_each_member_device_rcu() now declares loop iter bcachefs: vstruct_for_each() now declares loop iter bcachefs: fsck -> bch2_trans_run() bcachefs: kill __bch2_btree_iter_peek_upto_and_restart() bcachefs: bkey_for_each_ptr() now declares loop iter bcachefs: for_each_keylist_key() declares loop iter bcachefs: skip journal more often in key cache reclaim bcachefs: Convert split_devs() to darray bcachefs: Kill GFP_NOFAIL usage in readahead path bcachefs: minor bch2_btree_path_set_pos() optimization bcachefs: bch2_path_get() -> btree_path_idx_t bcachefs; bch2_path_put() -> btree_path_idx_t bcachefs: bch2_btree_path_set_pos() -> btree_path_idx_t bcachefs: bch2_btree_path_make_mut() -> btree_path_idx_t bcachefs: bch2_btree_path_traverse() -> btree_path_idx_t bcachefs: btree_path_alloc() -> btree_path_idx_t bcachefs: btree_iter -> btree_path_idx_t bcachefs: btree_insert_entry -> btree_path_idx_t bcachefs: struct trans_for_each_path_inorder_iter bcachefs: bch2_btree_path_to_text() -> btree_path_idx_t bcachefs: kill trans_for_each_path_from() bcachefs: trans_for_each_path() no longer uses path->idx bcachefs: trans_for_each_path_with_node() no longer uses path->idx bcachefs: bch2_path_get() no longer uses path->idx bcachefs: bch2_btree_iter_peek_prev() no longer uses path->idx bcachefs: get_unlocked_mut_path() -> btree_path_idx_t bcachefs: kill btree_path.idx bcachefs: Clean up btree_trans bcachefs: rcu protect trans->paths bcachefs: optimize __bch2_trans_get(), kill DEBUG_TRANSACTIONS bcachefs: trans->updates will also be resizable bcachefs: trans->nr_paths bcachefs: Fix interior update path btree_path uses bcachefs: growable btree_paths bcachefs: bch2_btree_trans_peek_updates bcachefs: bch2_btree_trans_peek_prev_updates bcachefs: bch2_btree_trans_peek_slot_updates bcachefs: Fix reattach_inode() for snapshots bcachefs: check_directory_structure() can now be run online bcachefs: Check journal entries for invalid keys in trans commit path bcachefs: Fix nochanges/read_only interaction bcachefs: bch_member->seq bcachefs: Split brain detection bcachefs: btree_trans always has stats bcachefs: track transaction durations bcachefs: wb_key_cmp -> wb_key_ref_cmp bcachefs: __journal_keys_sort() refactoring bcachefs: __bch2_journal_key_to_wb -> bch2_journal_key_to_wb_slowpath bcachefs: Fix printing of device durability bcachefs: factor out thread_with_file, thread_with_stdio bcachefs: Upgrading uses bch_sb.recovery_passes_required bcachefs: trans_mark now takes bkey_s bcachefs: mark now takes bkey_s bcachefs: Kill BTREE_TRIGGER_NOATOMIC bcachefs: BTREE_TRIGGER_TRANSACTIONAL bcachefs: kill mem_trigger_run_overwrite_then_insert() bcachefs: unify inode trigger bcachefs: unify reflink_p trigger bcachefs: unify reservation trigger bcachefs: move bch2_mark_alloc() to alloc_background.c bcachefs: unify alloc trigger bcachefs: move stripe triggers to ec.c bcachefs: unify stripe trigger bcachefs: bch2_trigger_pointer() bcachefs: Online fsck can now fix errors bcachefs: bch2_trigger_stripe_ptr() bcachefs: unify extent trigger bcachefs: Combine .trans_trigger, .atomic_trigger bcachefs: kill useless return ret bcachefs: Add an option to control btree node prefetching bcachefs: don't clear accessed bit in btree node fill bcachefs: add time_stats for btree_node_read_done() bcachefs: increase max_active on io_complete_wq bcachefs: add missing bch2_latency_acct() call bcachefs: Don't autofix errors we can't fix bcachefs: no thread_with_file in userspace bcachefs: Upgrades now specify errors to fix, like downgrades bcachefs: fsck_err()s don't need to manually check c->sb.version anymore bcachefs: Improve would_deadlock trace event bcachefs: %pg is banished bcachefs: __bch2_sb_field_to_text() bcachefs: print sb magic when relevant bcachefs: improve validate_bset_keys() bcachefs: improve checksum error messages bcachefs: bch2_dump_bset() doesn't choke on u64s == 0 bcachefs: Restart recovery passes more reliably bcachefs: fix simulateously upgrading & downgrading bcachefs: move "ptrs not changing" optimization to bch2_trigger_extent() bcachefs: eytzinger0_find() search should be const Randy Dunlap (2): bcachefs: six lock: fix typos bcachefs: mean and variance: fix kernel-doc for function params Richard Davies (1): bcachefs: Remove obsolete comment about zstd Yang Li (1): bcachefs: clean up one inconsistent indenting MAINTAINERS | 2 +- arch/powerpc/kernel/firmware.c | 2 + fs/bcachefs/Kconfig | 18 +- fs/bcachefs/Makefile | 1 + fs/bcachefs/alloc_background.c | 484 +++++----- fs/bcachefs/alloc_background.h | 39 +- fs/bcachefs/alloc_foreground.c | 46 +- fs/bcachefs/backpointers.c | 199 +++-- fs/bcachefs/backpointers.h | 27 +- fs/bcachefs/bcachefs.h | 192 +++- fs/bcachefs/bcachefs_format.h | 123 ++- fs/bcachefs/bcachefs_ioctl.h | 60 +- fs/bcachefs/bkey_methods.h | 82 +- fs/bcachefs/bset.c | 6 + fs/bcachefs/btree_cache.c | 28 +- fs/bcachefs/btree_cache.h | 4 +- fs/bcachefs/btree_gc.c | 327 +++---- fs/bcachefs/btree_io.c | 132 ++- fs/bcachefs/btree_io.h | 2 +- fs/bcachefs/btree_iter.c | 945 ++++++++++---------- fs/bcachefs/btree_iter.h | 407 ++++----- fs/bcachefs/btree_journal_iter.c | 25 +- fs/bcachefs/btree_key_cache.c | 63 +- fs/bcachefs/btree_key_cache.h | 2 - fs/bcachefs/btree_locking.c | 111 ++- fs/bcachefs/btree_locking.h | 16 +- fs/bcachefs/btree_trans_commit.c | 313 +++---- fs/bcachefs/btree_types.h | 136 +-- fs/bcachefs/btree_update.c | 245 ++---- fs/bcachefs/btree_update.h | 111 ++- fs/bcachefs/btree_update_interior.c | 322 +++---- fs/bcachefs/btree_update_interior.h | 11 +- fs/bcachefs/btree_write_buffer.c | 668 +++++++++----- fs/bcachefs/btree_write_buffer.h | 53 +- fs/bcachefs/btree_write_buffer_types.h | 63 +- fs/bcachefs/buckets.c | 1511 ++++++++------------------------ fs/bcachefs/buckets.h | 45 +- fs/bcachefs/buckets_types.h | 2 - fs/bcachefs/chardev.c | 363 ++++++-- fs/bcachefs/checksum.h | 23 + fs/bcachefs/compress.c | 4 - fs/bcachefs/darray.h | 8 +- fs/bcachefs/data_update.c | 30 +- fs/bcachefs/debug.c | 141 ++- fs/bcachefs/dirent.c | 51 +- fs/bcachefs/dirent.h | 7 +- fs/bcachefs/disk_groups.c | 13 +- fs/bcachefs/ec.c | 406 +++++++-- fs/bcachefs/ec.h | 5 +- fs/bcachefs/ec_types.h | 2 +- fs/bcachefs/errcode.h | 7 +- fs/bcachefs/error.c | 103 ++- fs/bcachefs/extent_update.c | 2 +- fs/bcachefs/extents.c | 4 - fs/bcachefs/extents.h | 24 +- fs/bcachefs/eytzinger.h | 10 +- fs/bcachefs/fs-common.c | 36 +- fs/bcachefs/fs-io-buffered.c | 38 +- fs/bcachefs/fs-io-direct.c | 3 - fs/bcachefs/fs-io.c | 20 +- fs/bcachefs/fs-ioctl.c | 12 +- fs/bcachefs/fs.c | 100 +-- fs/bcachefs/fs.h | 9 +- fs/bcachefs/fsck.c | 630 ++++++------- fs/bcachefs/inode.c | 129 ++- fs/bcachefs/inode.h | 15 +- fs/bcachefs/io_misc.c | 55 +- fs/bcachefs/io_read.c | 50 +- fs/bcachefs/io_write.c | 45 +- fs/bcachefs/journal.c | 108 ++- fs/bcachefs/journal.h | 4 +- fs/bcachefs/journal_io.c | 153 ++-- fs/bcachefs/journal_reclaim.c | 120 ++- fs/bcachefs/journal_reclaim.h | 16 +- fs/bcachefs/journal_seq_blacklist.c | 2 +- fs/bcachefs/journal_types.h | 16 +- fs/bcachefs/keylist.c | 2 - fs/bcachefs/keylist.h | 4 +- fs/bcachefs/logged_ops.c | 18 +- fs/bcachefs/lru.c | 11 +- fs/bcachefs/mean_and_variance.c | 10 +- fs/bcachefs/mean_and_variance.h | 5 +- fs/bcachefs/migrate.c | 9 +- fs/bcachefs/move.c | 187 ++-- fs/bcachefs/move.h | 13 +- fs/bcachefs/movinggc.c | 49 +- fs/bcachefs/opts.c | 4 +- fs/bcachefs/opts.h | 20 +- fs/bcachefs/quota.c | 28 +- fs/bcachefs/rebalance.c | 38 +- fs/bcachefs/recovery.c | 291 +++--- fs/bcachefs/recovery.h | 1 + fs/bcachefs/recovery_types.h | 25 +- fs/bcachefs/reflink.c | 224 ++++- fs/bcachefs/reflink.h | 22 +- fs/bcachefs/replicas.c | 66 +- fs/bcachefs/replicas.h | 22 +- fs/bcachefs/replicas_types.h | 6 +- fs/bcachefs/sb-clean.c | 20 +- fs/bcachefs/sb-downgrade.c | 90 +- fs/bcachefs/sb-downgrade.h | 1 + fs/bcachefs/sb-errors_types.h | 4 +- fs/bcachefs/sb-members.c | 18 +- fs/bcachefs/sb-members.h | 100 ++- fs/bcachefs/six.c | 117 +-- fs/bcachefs/six.h | 13 +- fs/bcachefs/snapshot.c | 174 ++-- fs/bcachefs/snapshot.h | 8 +- fs/bcachefs/str_hash.h | 25 +- fs/bcachefs/subvolume.c | 31 +- fs/bcachefs/subvolume_types.h | 4 + fs/bcachefs/super-io.c | 168 ++-- fs/bcachefs/super-io.h | 7 +- fs/bcachefs/super.c | 388 ++++---- fs/bcachefs/super.h | 6 +- fs/bcachefs/super_types.h | 2 +- fs/bcachefs/sysfs.c | 160 ++-- fs/bcachefs/tests.c | 193 ++-- fs/bcachefs/thread_with_file.c | 299 +++++++ fs/bcachefs/thread_with_file.h | 41 + fs/bcachefs/thread_with_file_types.h | 16 + fs/bcachefs/trace.h | 278 ++++-- fs/bcachefs/util.c | 191 ++-- fs/bcachefs/util.h | 56 +- fs/bcachefs/vstructs.h | 10 +- 125 files changed, 7101 insertions(+), 5961 deletions(-) create mode 100644 fs/bcachefs/thread_with_file.c create mode 100644 fs/bcachefs/thread_with_file.h create mode 100644 fs/bcachefs/thread_with_file_types.h