Re: Race: data race between ext4_setattr() and acl_permission_check()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for your feedback and we really value your opinion. Our project aims at finding rare data races quickly so diagnosing is not yet in our scope for now, but it definitely shouldn't be an excuse. We did some further analysis and we would like to report it as well. Hope this could be helpful.


------------------------------------------
Interleaving
We observed two different interleavings of two instructions, as illustrated below.

Interleaving 1
Writer							Reader
							if (in_group_p(inode->i_gid))
							// read value = 0 (initial value)
inode->i_gid = attr->ia_gid
// write value == 0xee00
...	
							if ((mask & ~mode & (MAY_READ | MAY_WRITE | MAY_EXEC)) == 0)
return 0;

Interleaving 2
Writer							Reader
inode->i_gid = attr->ia_gid
// write value == 0xee00
							if (in_group_p(inode->i_gid))
							// read value = 0xee00			
							...
							if ((mask & ~mode & (MAY_READ | MAY_WRITE | MAY_EXEC)) == 0)
return 0;

However, in both cases, the function acl_permission_check() returns will value 0, so we were not able to find any explicit impact it may have. If this attribute is lock-free semantically, then it should be fine.


In fact, we also found another writer site that could race with the reader. We noticed that both two writers are protected so we were curious about the reader. 
------------------------------------------
Another writer
/tmp/tmp.B7zb7od2zE-5.3.11/extract/linux-5.3.11/fs/attr.c:185
        165   * @inode:  the inode to be updated
        166   * @attr:   the new attributes
        167   *
        168   * setattr_copy must be called with i_mutex held.
        169   *
        170   * setattr_copy updates the inode's metadata with that specified
        171   * in attr. Noticeably missing is inode size update, which is more complex
        172   * as it requires pagecache updates.
        173   *
        174   * The inode is not marked as dirty after this operation. The rationale is
        175   * that for "simple" filesystems, the struct inode is the inode storage.
        176   * The caller is free to mark the inode dirty afterwards if needed.
        177   */
        178  void setattr_copy(struct inode *inode, const struct iattr *attr)
        179  {
        180      unsigned int ia_valid = attr->ia_valid;
        181
        182      if (ia_valid & ATTR_UID)
        183          inode->i_uid = attr->ia_uid;
        184      if (ia_valid & ATTR_GID)
 ==>    185          inode->i_gid = attr->ia_gid;
        186      if (ia_valid & ATTR_ATIME)
        187          inode->i_atime = timespec64_trunc(attr->ia_atime,
        188                            inode->i_sb->s_time_gran);
        189      if (ia_valid & ATTR_MTIME)
        190          inode->i_mtime = timespec64_trunc(attr->ia_mtime,
        191                            inode->i_sb->s_time_gran);
        192      if (ia_valid & ATTR_CTIME)
        193          inode->i_ctime = timespec64_trunc(attr->ia_ctime,
        194                            inode->i_sb->s_time_gran);
        195      if (ia_valid & ATTR_MODE) {
        196          umode_t mode = attr->ia_mode;
        197
        198          if (!in_group_p(inode->i_gid) &&
        199              !capable_wrt_inode_uidgid(inode, CAP_FSETID))
        200              mode &= ~S_ISGID;
        201          inode->i_mode = mode;
        202      }
        203  }

		
Thanks,
Sishuai

> On Nov 30, 2020, at 1:23 PM, Theodore Y. Ts'o <tytso@xxxxxxx> wrote:
> 
> On Mon, Nov 30, 2020 at 05:55:20PM +0000, Gong, Sishuai wrote:
>> Hi,
>> 
>> We found a data race in linux kernel 5.3.11 that we are able to
>> reproduce in x86 under specific interleavings. Currently, we are not
>> sure about the consequence of this race so we would like to confirm
>> with the community if this can be a harmful bug.
> 
> What do you mean by "data race" in this context?  If you have one
> process setting a file's group id, while another process is trying to
> open that same file and that open is depending on the process's group
> access vs the file's group id, the open might succeed or fail
> depending on whether the chgrp(2) is executed before or after the
> open(2).
> 
> I could see data race if in some other context of the file open, the
> group id is read, and so the group id might be inconsistent during the
> course of operating on a particular system call.  In which case, we
> might need to cache the group id value, or take some kind of lock,
> etc.
> 
> But if your automated tool can't distinguish whether or not this is
> the case, I'll gently suggest that it's not particuarly useful....
> Maybe this is something that should be the subject of further
> research?  The whole point of automated tools, after all, is to save
> developers' time.  And not asking them to chase down potential
> questions like "so is this real or not"?
> 
> Cheers,
> 
> 					- Ted
> 
>> 
>> ------------------------------------------
>> Writer site
>> 
>> /tmp/tmp.B7zb7od2zE-5.3.11/extract/linux-5.3.11/fs/ext4/inode.c:5599
>>       5579              error = PTR_ERR(handle);
>>       5580              goto err_out;
>>       5581          }
>>       5582
>>       5583          /* dquot_transfer() calls back ext4_get_inode_usage() which
>>       5584           * counts xattr inode references.
>>       5585           */
>>       5586          down_read(&EXT4_I(inode)->xattr_sem);
>>       5587          error = dquot_transfer(inode, attr);
>>       5588          up_read(&EXT4_I(inode)->xattr_sem);
>>       5589
>>       5590          if (error) {
>>       5591              ext4_journal_stop(handle);
>>       5592              return error;
>>       5593          }
>>       5594          /* Update corresponding info in inode so that everything is in
>>       5595           * one transaction */
>>       5596          if (attr->ia_valid & ATTR_UID)
>>       5597              inode->i_uid = attr->ia_uid;
>>       5598          if (attr->ia_valid & ATTR_GID)
>> ==>   5599              inode->i_gid = attr->ia_gid;
>>       5600          error = ext4_mark_inode_dirty(handle, inode);
>>       5601          ext4_journal_stop(handle);
>>       5602      }
>>       5603
>>       5604      if (attr->ia_valid & ATTR_SIZE) {
>>       5605          handle_t *handle;
>>       5606          loff_t oldsize = inode->i_size;
>>       5607          int shrink = (attr->ia_size < inode->i_size);
>>       5608
>>       5609          if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) {
>>       5610              struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
>>       5611
>>       5612              if (attr->ia_size > sbi->s_bitmap_maxbytes)
>>       5613                  return -EFBIG;
>>       5614          }
>>       5615          if (!S_ISREG(inode->i_mode))
>>       5616              return -EINVAL;
>>       5617
>>       5618          if (IS_I_VERSION(inode) && attr->ia_size != inode->i_size)
>>       5619              inode_inc_iversion(inode);
>> 
>> ------------------------------------------
>> Reader site
>> 
>> /tmp/tmp.B7zb7od2zE-5.3.11/extract/linux-5.3.11/fs/namei.c:306
>>        286
>>        287      return -EAGAIN;
>>        288  }
>>        289
>>        290  /*
>>        291   * This does the basic permission checking
>>        292   */
>>        293  static int acl_permission_check(struct inode *inode, int mask)
>>        294  {
>>        295      unsigned int mode = inode->i_mode;
>>        296
>>        297      if (likely(uid_eq(current_fsuid(), inode->i_uid)))
>>        298          mode >>= 6;
>>        299      else {
>>        300          if (IS_POSIXACL(inode) && (mode & S_IRWXG)) {
>>        301              int error = check_acl(inode, mask);
>>        302              if (error != -EAGAIN)
>>        303                  return error;
>>        304          }
>>        305
>> ==>    306          if (in_group_p(inode->i_gid))
>>        307              mode >>= 3;
>>        308      }
>>        309
>>        310      /*
>>        311       * If the DACs are ok we don't need any capability check.
>>        312       */
>>        313      if ((mask & ~mode & (MAY_READ | MAY_WRITE | MAY_EXEC)) == 0)
>>        314          return 0;
>>        315      return -EACCES;
>>        316  }
>>        317
>> ------------------------------------------
>> Writer calling trace
>> 
>> - do_fchownat
>> -- chown_common
>> --- notify_change
>> 
>> ------------------------------------------
>> Reader calling trace
>> 
>> - do_execve
>> -- __do_execve_file
>> --- do_open_execat
>> ---- do_filp_open
>> ----- path_openat
>> ------ link_path_walk
>> ------- inode_permission
>> -------- generic_permission
>> 
>> 
>> 
>> Thanks,
>> Sishuai
>> 





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux