On Mon, Apr 19, 2021 at 09:20:40PM +0100, David Howells wrote: > Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > > i see worse inlining decisions from gcc with this. maybe you see > > an improvement that would justify it? > > > > [ref: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99998] > > Perhaps attach the patch to the bz, see if the compiler guys can recommend > anything? your test case loses the bogus branch 0000000000000000 <PageUptodate>: 0: 48 8b 47 08 mov 0x8(%rdi),%rax 4: a8 01 test $0x1,%al 6: 74 04 je c <PageUptodate+0xc> 8: 48 8d 78 ff lea -0x1(%rax),%rdi c: 8b 07 mov (%rdi),%eax e: 48 c1 e8 02 shr $0x2,%rax 12: 24 01 and $0x1,%al 14: 74 00 je 16 <PageUptodate+0x16> 16: c3 retq 0000000000000017 <Page2Uptodate>: 17: 48 8b 47 08 mov 0x8(%rdi),%rax 1b: a8 01 test $0x1,%al 1d: 74 04 je 23 <Page2Uptodate+0xc> 1f: 48 8d 78 ff lea -0x1(%rax),%rdi 23: 8b 07 mov (%rdi),%eax 25: 48 c1 e8 02 shr $0x2,%rax 29: 83 e0 01 and $0x1,%eax 2c: c3 retq but that means that gcc then does more inlining to functions that call PageUptodate: $ ./scripts/bloat-o-meter filemap-before.o filemap-after.o add/remove: 0/0 grow/shrink: 3/4 up/down: 179/-91 (88) Function old new delta mapping_seek_hole_data 1203 1347 +144 __lock_page_killable 394 426 +32 next_uptodate_page 603 606 +3 wait_on_page_bit_common 582 576 -6 filemap_get_pages 1530 1512 -18 do_read_cache_page 1031 1012 -19 filemap_read_page 261 213 -48 Total: Before=24603, After=24691, chg +0.36% but maybe you have a metric that shows this winning at scale instead of in a micro?