On Fri, Nov 10, 2023 at 10:36 PM Matthew Wilcox wrote: > > On Fri, Nov 10, 2023 at 04:48:02PM +0900, Ryusuke Konishi wrote: > > On Fri, Nov 10, 2023 at 6:06 AM Matthew Wilcox (Oracle) wrote: > > > +++ b/fs/buffer.c > > > @@ -199,7 +199,7 @@ __find_get_block_slow(struct block_device *bdev, sector_t block) > > > int all_mapped = 1; > > > static DEFINE_RATELIMIT_STATE(last_warned, HZ, 1); > > > > > > - index = block >> (PAGE_SHIFT - bd_inode->i_blkbits); > > > + index = ((loff_t)block << bd_inode->i_blkbits) / PAGE_SIZE; > > > > Multiple 64-bit divisions are used in this patch, but why not use two > > stage shifts as shown below if there is no left shift overflow and the > > sign bit will not be set ? > > > > index = ((loff_t)block << bd_inode->i_blkbits) >> PAGE_SHIFT; > > Here's what the compiler turns that into: > > 3223: 49 8b 86 80 00 00 00 mov 0x80(%r14),%rax > 322a: 4c 89 ee mov %r13,%rsi > 322d: ba 01 00 00 00 mov $0x1,%edx > 3232: 0f b6 88 c2 00 00 00 movzbl 0xc2(%rax),%ecx > ^ this is where we load i_blkbits into ecx > 3239: 48 89 45 d0 mov %rax,-0x30(%rbp) > 323d: 4c 8b 60 30 mov 0x30(%rax),%r12 > 3241: 48 d3 e6 shl %cl,%rsi > ^ shift left by %cl (the bottom byte of ecx) > 3244: 31 c9 xor %ecx,%ecx > 3246: 48 c1 ee 0c shr $0xc,%rsi > ^ shift right by 12 > 324a: 4c 89 e7 mov %r12,%rdi > 324d: e8 00 00 00 00 call 3252 <__find_get_block+0x82> > 324e: R_X86_64_PLT32 __filemap_get_folio-0x4 Ah, similarly in my environment, these divisions by constant are optimized (and no link errors happen even on a 32-bit machine). Sorry for taking up your time with unnecessary comments along with that for patch 3/7. Regards, Ryusuke Konishi