On Fri, Aug 26, 2022 at 9:45 AM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > So narrowing the load is fine (but you generally never want to narrow > a *store*, because that results in huge problems with subsequent wider > loads). .. so making that statement made me go look around, and it's exactly what we use for clear_bit() on x86. Which is actually ok too, because it's a locked operation, and at that point the whole "store buffer forwarding" issue pretty much goes out the window anyway. But because I looked at where the new test_bit_acquire() triggers and where there are other bitops around it, I found this beauty in fs/buffer.c: clean_bdev_aliases(): if (!buffer_mapped(bh) || (bh->b_blocknr < block)) goto next; if (bh->b_blocknr >= block + len) break; clear_buffer_dirty(bh); wait_on_buffer(bh); clear_buffer_req(bh); where it basically works on four different bits (buffer_mapped, dirty, lock, req) right next to each other. That code sequence really doesn't matter, but it was interesting seeing the generated code. Not pretty, but the ugliest part was actually how the might_sleep() calls in those helper functions result in __cond_resched() when PREEMPT_VOLUNTARY is on, and how horrid that is for register allocation. And in bh_submit_read() we have another ugly thing: mov (%rdi),%rax test $0x4,%al je <bh_submit_read+0x7d> push %rbx mov %rdi,%rbx testb $0x1,(%rdi) where we have a mix of those two kinds of "test_bit()" (and test "testb" version most definitely looks better. But that's due to the source code being BUG_ON(!buffer_locked(bh)); if (buffer_uptodate(bh)) { ie we have that *ancient* BUG_ON() messing things up. Oh well. None of the buffer-head code matters any more, bit it's certainly showing the effects of that patch of yours, and the code isn't pretty. But none of the ugliness I found was actually _due_ to your patch. It was all due to other things, and your patch just makes code generation better in the cases I looked at. Linus