Re: [PATCH] fs: fix a data race in i_size_write/i_size_read

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 19, 2020 at 10:21:46AM +0100, Marco Elver wrote:
> Right. In reality, for mainstream architectures, it appears quite unlikely.
> 
> There may be other valid reasons, such as documenting the fact the
> write can happen concurrently with loads.
> 
> Let's assume the WRITE_ONCE can be dropped.
> 
> The load is a different story. While load tearing may not be an issue,
> it's more likely that other optimizations can break the code. For
> example load fusing can break code that expects repeated loads in a
> loop. E.g. I found these uses of i_size_read in loops:
> 
> git grep -E '(for|while) \(.*i_size_read'
> fs/ocfs2/dir.c: while (ctx->pos < i_size_read(inode)) {
> fs/ocfs2/dir.c:                 for (i = 0; i < i_size_read(inode) &&
> i < offset; ) {
> fs/ocfs2/dir.c: while (ctx->pos < i_size_read(inode)) {
> fs/ocfs2/dir.c:         while (ctx->pos < i_size_read(inode)
> fs/squashfs/dir.c:      while (length < i_size_read(inode)) {
> fs/squashfs/namei.c:    while (length < i_size_read(dir)) {
> 
> Can i_size writes happen concurrently, and if so will these break if
> the compiler decides to just do i_size_read's load once, and keep the
> result in a register?

It depends on the semantics and the behaviour when the value is not
cached in a register might be the wrong one. A concrete example with
assembly and analysis can be found in d98da49977f6 ("btrfs: save i_size
to avoid double evaluation of i_size_read in compress_file_range"),
which is the workardound mentioned in the my other mail.

C:
    actual_end = min_t(u64, i_size_read(inode), end + 1);

Asm:

        mov    0x20(%rsp),%rax
        cmp    %rax,0x48(%r15)           # read
        movl   $0x0,0x18(%rsp)
        mov    %rax,%r12
        mov    %r14,%rax
        cmovbe 0x48(%r15),%r12           # eval
    
      Where r15 is inode and 0x48 is offset of i_size.
    
      The original fix was to revert 62b37622718c that would do an
      intermediate assignment and this would also avoid the doulble
      evaluation but is not future-proof, should the compiler merge the
      stores and call i_size_read anyway.
    
      There's a patch adding READ_ONCE to i_size_read but that's not being
      applied at the moment and we need to fix the bug. Instead, emulate
      READ_ONCE by two barrier()s that's what effectively happens. The
      assembly confirms single evaluation:
    
        mov    0x48(%rbp),%rax          # read once
        mov    0x20(%rsp),%rcx
        mov    $0x20,%edx
        cmp    %rax,%rcx
        cmovbe %rcx,%rax
        mov    %rax,(%rsp)
        mov    %rax,%rcx
        mov    %r14,%rax
    
      Where 0x48(%rbp) is inode->i_size stored to %eax.



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux