Thanks to Christian Brauner for the reminder and Jeff Layton for the fix suggestion. I will base my patch on vfs.fixes and resend the patch v4 after fixing it. > -----原始郵件----- > 發件人: "Jeff Layton" <jlayton@xxxxxxxxxx> > 發送時間:2024-11-23 22:01:08 (星期六) > 收件人: "Hao-ran Zheng" <zhenghaoran@xxxxxxxxxxx>, viro@xxxxxxxxxxxxxxxxxx, brauner@xxxxxxxxxx, jack@xxxxxxx, linux-fsdevel@xxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx > 抄送: baijiaju1990@xxxxxxxxx, 21371365@xxxxxxxxxxx > 主題: Re: [PATCH v3] fs: Fix data race in inode_set_ctime_to_ts > > On Fri, 2024-11-22 at 21:06 +0800, Hao-ran Zheng wrote: > > A data race may occur when the function `inode_set_ctime_to_ts()` and > > the function `inode_get_ctime_sec()` are executed concurrently. When > > two threads call `aio_read` and `aio_write` respectively, they will > > be distributed to the read and write functions of the corresponding > > file system respectively. Taking the btrfs file system as an example, > > the `btrfs_file_read_iter` and `btrfs_file_write_iter` functions are > > finally called. These two functions created a data race when they > > finally called `inode_get_ctime_sec()` and `inode_set_ctime_to_ns()`. > > The specific call stack that appears during testing is as follows: > > > > ============DATA_RACE============ > > btrfs_delayed_update_inode+0x1f61/0x7ce0 [btrfs] > > btrfs_update_inode+0x45e/0xbb0 [btrfs] > > btrfs_dirty_inode+0x2b8/0x530 [btrfs] > > btrfs_update_time+0x1ad/0x230 [btrfs] > > touch_atime+0x211/0x440 > > filemap_read+0x90f/0xa20 > > btrfs_file_read_iter+0xeb/0x580 [btrfs] > > aio_read+0x275/0x3a0 > > io_submit_one+0xd22/0x1ce0 > > __se_sys_io_submit+0xb3/0x250 > > do_syscall_64+0xc1/0x190 > > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > ============OTHER_INFO============ > > btrfs_write_check+0xa15/0x1390 [btrfs] > > btrfs_buffered_write+0x52f/0x29d0 [btrfs] > > btrfs_do_write_iter+0x53d/0x1590 [btrfs] > > btrfs_file_write_iter+0x41/0x60 [btrfs] > > aio_write+0x41e/0x5f0 > > io_submit_one+0xd42/0x1ce0 > > __se_sys_io_submit+0xb3/0x250 > > do_syscall_64+0xc1/0x190 > > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > To address this issue, it is recommended to add WRITE_ONCE > > when writing the `inode->i_ctime_sec` variable.and add > > READ_ONCE when reading in function `inode_get_ctime_sec()` > > and `inode_get_ctime_nsec()`. > > > > Signed-off-by: Hao-ran Zheng <zhenghaoran@xxxxxxxxxxx> > > --- > > V2 -> V3: Added READ_ONCE in inode_get_ctime_nsec() and addressed review comments > > V1 -> V2: Added READ_ONCE in inode_get_ctime_sec() > > --- > > include/linux/fs.h | 8 ++++---- > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > diff --git a/include/linux/fs.h b/include/linux/fs.h > > index 3559446279c1..c18f9a9ee5e7 100644 > > --- a/include/linux/fs.h > > +++ b/include/linux/fs.h > > @@ -1655,12 +1655,12 @@ static inline struct timespec64 inode_set_mtime(struct inode *inode, > > > > static inline time64_t inode_get_ctime_sec(const struct inode *inode) > > { > > - return inode->i_ctime_sec; > > + return READ_ONCE(inode->i_ctime_sec); > > } > > > > static inline long inode_get_ctime_nsec(const struct inode *inode) > > { > > - return inode->i_ctime_nsec; > > + return READ_ONCE(inode->i_ctime_nsec); > > } > > > > static inline struct timespec64 inode_get_ctime(const struct inode *inode) > > @@ -1674,8 +1674,8 @@ static inline struct timespec64 inode_get_ctime(const struct inode *inode) > > static inline struct timespec64 inode_set_ctime_to_ts(struct inode *inode, > > struct timespec64 ts) > > { > > - inode->i_ctime_sec = ts.tv_sec; > > - inode->i_ctime_nsec = ts.tv_nsec; > > + WRITE_ONCE(inode->i_ctime_sec, ts.tv_sec); > > + WRITE_ONCE(inode->i_ctime_nsec, ts.tv_nsec); > > return ts; > > } > > > > Looks reasonable. There are also bare fetches and stores of the > i_ctime_sec field in inode_set_ctime_current(). Do we need something > like this in addition to the above? > > > diff --git a/fs/inode.c b/fs/inode.c > index b13b778257ae..d869ee6f1c6b 100644 > --- a/fs/inode.c > +++ b/fs/inode.c > @@ -2788,7 +2788,7 @@ struct timespec64 inode_set_ctime_current(struct inode *inode) > */ > cns = smp_load_acquire(&inode->i_ctime_nsec); > if (cns & I_CTIME_QUERIED) { > - struct timespec64 ctime = { .tv_sec = inode->i_ctime_sec, > + struct timespec64 ctime = { .tv_sec = READ_ONCE(inode->i_ctime_sec), > .tv_nsec = cns & ~I_CTIME_QUERIED }; > > if (timespec64_compare(&now, &ctime) <= 0) { > @@ -2809,7 +2809,7 @@ struct timespec64 inode_set_ctime_current(struct inode *inode) > /* Try to swap the nsec value into place. */ > if (try_cmpxchg(&inode->i_ctime_nsec, &cur, now.tv_nsec)) { > /* If swap occurred, then we're (mostly) done */ > - inode->i_ctime_sec = now.tv_sec; > + WRITE_ONCE(inode->i_ctime_sec, now.tv_sec); > trace_ctime_ns_xchg(inode, cns, now.tv_nsec, cur); > mgtime_counter_inc(mg_ctime_swaps); > } else { > @@ -2824,7 +2824,7 @@ struct timespec64 inode_set_ctime_current(struct inode *inode) > goto retry; > } > /* Otherwise, keep the existing ctime */ > - now.tv_sec = inode->i_ctime_sec; > + now.tv_sec = READ_ONCE(inode->i_ctime_sec); > now.tv_nsec = cur & ~I_CTIME_QUERIED; > } > out: