Re: [GIT PULL] bcachefs

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Wed, 6 Sep 2023 13:02:32 -0700

On Wed, 6 Sept 2023 at 12:36, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> There are a few other issues that I have with this, and Christoph did
> mention a big one: it's not been in linux-next.

Oh, and the fact that it hasn't been in linux-next became immediately
clear, and it's a real practical problem.

I get a compile error when just doing a basic built-test:

  fs/bcachefs/btree_key_cache.c: In function ‘btree_key_cache_create’:
  fs/bcachefs/btree_key_cache.c:356:56: error: implicit conversion
from ‘enum btree_node_locked_type’ to ‘enum six_lock_type’
[-Werror=enum-conversion]
    356 |                 mark_btree_node_locked(trans, path, 0,
BTREE_NODE_UNLOCKED);
        |
^~~~~~~~~~~~~~~~~~~

and that compile error is actually a sign of a pretty serious bug.

BTREE_NODE_UNLOCKED is of the wrong enum type, and has a value (-1)
that simply cannot be represented in a 'enum six_lock_type'.

I'm pretty sure that the compiler is actually allowed to assume it is
in the range of the source enum.

Yes, the inline function then does a cast to 'enum
btree_node_locked_type', but because the compiler may assume that the
incoming enum has values 0..2, it's not clear that it will necessarily
cast the value -1 to the expected value.

It probably works in practice on x86 (without -fshort-enum, I think
gcc will always use an 'int'), but that code really is wrong. On other
platforms, gcc defaults -fshort-enums on, and in that case a 'enum
six_lock_type' is actually of underlying type 'unsigned char'.

And guess what happens when you have (unsigned char)-1? It does *not*
cast back to -1.

I don't know whether any of the architectures we support has that
"default to -fshort-enums" behaviour, and at least hexagon uses
"-fno-short-enums" to avoid this issue, but it really is a serious
bug.

A Clang build also results in objtool complaints about bcachefs, with

  fs/bcachefs/buckets.o: warning: objtool:
bch2_trans_mark_metadata_bucket() falls through to next function
bch2_trans_mark_dev_sb()
  fs/bcachefs/fsck.o: warning: objtool: write_inode() falls through to
next function hash_check_key()

and sure enough, the code generation ends up being

          movq    %rbx, %rdi
          movl    %r12d, %esi
          callq   bch2_trans_restart_error
  .Lfunc_end25:
          .size   write_inode, .Lfunc_end25-write_inode

and the issue is that bch2_trans_restart_error() is marked __noreturn,
but objtool hasn't been told about it, so objtool is very unhappy
about that "code seems to fall off the side of the earth" kind of
issue.

Both of these are very much signs of "this has never been tested in
other environments", and would presumably have been found immediately
in linux-next.

As it is, I'm not convinced I want to even continue to go through this
all, when it fails at the very first hurdle. A clean build is not some
"would be nice" feature, and it is big part of why we have automation
for new code.

                     Linus