On Fri 11-10-19 16:20:50, Josef Bacik wrote: > I spent the last few weeks running down a weird regression in btrfs we > were seeing in production. It turned out to be introduced by > 62b37622718c, which took the following > > loff_t isize = i_size_read(inode); > > actual_end = min_t(u64, isize, end + 1); > > and turned it into > > actual_end = min_t(u64, i_size_read(inode), end + 1); > > The problem here is that the compiler is optimizing out the temporary > variables used in __cmp_once, so the resulting assembly looks like this > > 498 actual_end = min_t(u64, i_size_read(inode), end + 1); > 0xffffffff814b08c1 <+145>: 48 8b 44 24 28 mov 0x28(%rsp),%rax > 0xffffffff814b08c6 <+150>: 48 39 45 50 cmp %rax,0x50(%rbp) > 0xffffffff814b08ca <+154>: 48 89 c6 mov %rax,%rsi > 0xffffffff814b08cd <+157>: 48 0f 46 75 50 cmovbe 0x50(%rbp),%rsi > > as you can see we read the value of the inode to compare, and then we > read it a second time to assign it. > > This code is simply an optimization, so there's no locking to keep > i_size from changing, however we really need min_t to actually return > the minimum value for these two values, which it is failing to do. > > We've reverted that patch for now to fix the problem, but it's only a > matter of time before the compiler becomes smart enough to optimize out > the loff_t isize intermediate variable as well. > > Instead we want to make it explicit that i_size_read() should only read > the value once. This will keep this class of problem from happening in > the future, regardless of what the compiler chooses to do. With this > change we get the following assembly generated for this code > > 491 actual_end = min_t(u64, i_size_read(inode), end + 1); > 0xffffffff8148f625 <+149>: 48 8b 44 24 20 mov 0x20(%rsp),%rax > > ./include/linux/compiler.h: > 199 __READ_ONCE_SIZE; > 0xffffffff8148f62a <+154>: 4c 8b 75 50 mov 0x50(%rbp),%r14 > > fs/btrfs/inode.c: > 491 actual_end = min_t(u64, i_size_read(inode), end + 1); > 0xffffffff8148f62e <+158>: 49 39 c6 cmp %rax,%r14 > 0xffffffff8148f631 <+161>: 4c 0f 47 f0 cmova %rax,%r14 > > and this works out properly, we only read the value once and so we won't > trip over this problem again. > > Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx> Yeah, given i_size_read() is specifically meant for unlocked access to i_size, it makes sense to hide the READ_ONCE() in that function (can corresponding WRITE_ONCE() in i_size_write()). So feel free to add: Reviewed-by: Jan Kara <jack@xxxxxxx> Honza > --- > include/linux/fs.h | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/include/linux/fs.h b/include/linux/fs.h > index e0d909d35763..0e3f887e2dc5 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -863,7 +863,7 @@ static inline loff_t i_size_read(const struct inode *inode) > preempt_enable(); > return i_size; > #else > - return inode->i_size; > + return READ_ONCE(inode->i_size); > #endif > } > > @@ -885,7 +885,7 @@ static inline void i_size_write(struct inode *inode, loff_t i_size) > inode->i_size = i_size; > preempt_enable(); > #else > - inode->i_size = i_size; > + WRITE_ONCE(inode->i_size, i_size); > #endif > } > > -- > 2.21.0 > -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR