Re: [PATCH v2] 9p: use inode->i_lock to protect i_size_write()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sorry for the late reply, I had thought I had but it looks like it
didn't go out.

Hou Tao wrote on Fri, Jan 18, 2019:
> Use inode->i_lock to protect i_size_write(), else i_size_read() in
> generic_fillattr() may loop infinitely in read_seqcount_begin() when
> multiple processes invoke v9fs_vfs_getattr() or v9fs_vfs_getattr_dotl()
> simultaneously under 32-bit SMP environment, and a soft lockup will be
> triggered as show below:

I didn't notice before, but we have a couple of other places that call
i_size_write() in the 9p code, namely v9fs_write_end() and
v9fs_file_write_iter().
I do not see what's special about these that would require not taking
the inode lock?

write_end() has a comment that i_size cannot change under it because it
has the i_mutex, but it's obviously not sufficient given the stat2inode
code does not have it, so it needs to do the same dance as write_iter.


>   watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [stat:2217]
>   Modules linked in:
>   CPU: 5 PID: 2217 Comm: stat Not tainted 5.0.0-rc1-00005-g7f702faf5a9e #4
>   Hardware name: Generic DT based system
>   PC is at generic_fillattr+0x104/0x108
>   LR is at 0xec497f00
>   pc : [<802b8898>]    lr : [<ec497f00>]    psr: 200c0013
>   sp : ec497e20  ip : ed608030  fp : ec497e3c
>   r10: 00000000  r9 : ec497f00  r8 : ed608030
>   r7 : ec497ebc  r6 : ec497f00  r5 : ee5c1550  r4 : ee005780
>   r3 : 0000052d  r2 : 00000000  r1 : ec497f00  r0 : ed608030
>   Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
>   Control: 10c5387d  Table: ac48006a  DAC: 00000051
>   CPU: 5 PID: 2217 Comm: stat Not tainted 5.0.0-rc1-00005-g7f702faf5a9e #4
>   Hardware name: Generic DT based system
>   Backtrace:
>   [<8010d974>] (dump_backtrace) from [<8010dc88>] (show_stack+0x20/0x24)
>   [<8010dc68>] (show_stack) from [<80a1d194>] (dump_stack+0xb0/0xdc)
>   [<80a1d0e4>] (dump_stack) from [<80109f34>] (show_regs+0x1c/0x20)
>   [<80109f18>] (show_regs) from [<801d0a80>] (watchdog_timer_fn+0x280/0x2f8)
>   [<801d0800>] (watchdog_timer_fn) from [<80198658>] (__hrtimer_run_queues+0x18c/0x380)
>   [<801984cc>] (__hrtimer_run_queues) from [<80198e60>] (hrtimer_run_queues+0xb8/0xf0)
>   [<80198da8>] (hrtimer_run_queues) from [<801973e8>] (run_local_timers+0x28/0x64)
>   [<801973c0>] (run_local_timers) from [<80197460>] (update_process_times+0x3c/0x6c)
>   [<80197424>] (update_process_times) from [<801ab2b8>] (tick_nohz_handler+0xe0/0x1bc)
>   [<801ab1d8>] (tick_nohz_handler) from [<80843050>] (arch_timer_handler_virt+0x38/0x48)
>   [<80843018>] (arch_timer_handler_virt) from [<80180a64>] (handle_percpu_devid_irq+0x8c/0x240)
>   [<801809d8>] (handle_percpu_devid_irq) from [<8017ac20>] (generic_handle_irq+0x34/0x44)
>   [<8017abec>] (generic_handle_irq) from [<8017b344>] (__handle_domain_irq+0x6c/0xc4)
>   [<8017b2d8>] (__handle_domain_irq) from [<801022e0>] (gic_handle_irq+0x4c/0x88)
>   [<80102294>] (gic_handle_irq) from [<80101a30>] (__irq_svc+0x70/0x98)
>   [<802b8794>] (generic_fillattr) from [<8056b284>] (v9fs_vfs_getattr_dotl+0x74/0xa4)
>   [<8056b210>] (v9fs_vfs_getattr_dotl) from [<802b8904>] (vfs_getattr_nosec+0x68/0x7c)
>   [<802b889c>] (vfs_getattr_nosec) from [<802b895c>] (vfs_getattr+0x44/0x48)
>   [<802b8918>] (vfs_getattr) from [<802b8a74>] (vfs_statx+0x9c/0xec)
>   [<802b89d8>] (vfs_statx) from [<802b9428>] (sys_lstat64+0x48/0x78)
>   [<802b93e0>] (sys_lstat64) from [<80101000>] (ret_fast_syscall+0x0/0x28)
> 
> Reported-by: Xing Gaopeng <xingaopeng@xxxxxxxxxx>
> Signed-off-by: Hou Tao <houtao1@xxxxxxxxxx>
> ---
> v2:
> 	* move acquisition of i_lock into v9fs_stat2inode()
>  	* rename v9fs_stat2inode() to v9fs_stat2inode_flags() to accomodate
> 	  v9fs_refresh_inode() which doesn't need to update i_size.

As a nitpick I don't really like foo() vs foo_flags() as
foo-that-takes-extra-flags.
There are a few such examples in the kernel already but I think it does
not really convery information; it's better to have the base function
take flags and just use it, or if you want wrappers then just never
expose the flags but make a static _v9fs_stat2inode take flags,
v9fs_stat2inode behave as the old one and a new
v9fs_stat2inode_keepisize for the update with cache.
I'd personally go with the former are there only are four call sites.

Anyway, no strong feeling there, do as you'd like for this one
(including keeping the current way if you think it's better); the
other comments require another round-trip anyway :)

> ---
>  fs/9p/v9fs_vfs.h       | 21 +++++++++++++++++++--
>  fs/9p/vfs_inode.c      | 28 ++++++++++++++++------------
>  fs/9p/vfs_inode_dotl.c | 28 +++++++++++++++++-----------
>  3 files changed, 52 insertions(+), 25 deletions(-)
> 
> diff --git a/fs/9p/v9fs_vfs.h b/fs/9p/v9fs_vfs.h
> index 5a0db6dec8d1..51b3e441b895 100644
> --- a/fs/9p/v9fs_vfs.h
> +++ b/fs/9p/v9fs_vfs.h
> @@ -40,6 +40,9 @@
>   */
>  #define P9_LOCK_TIMEOUT (30*HZ)
>  
> +/* flags for v9fs_stat2inode_flags() & v9fs_stat2inode_dotl_flags() */
> +#define V9FS_STAT2INODE_KEEP_ISIZE 1
> +
>  extern struct file_system_type v9fs_fs_type;
>  extern const struct address_space_operations v9fs_addr_operations;
>  extern const struct file_operations v9fs_file_operations;
> @@ -61,8 +64,10 @@ int v9fs_init_inode(struct v9fs_session_info *v9ses,
>  		    struct inode *inode, umode_t mode, dev_t);
>  void v9fs_evict_inode(struct inode *inode);
>  ino_t v9fs_qid2ino(struct p9_qid *qid);
> -void v9fs_stat2inode(struct p9_wstat *, struct inode *, struct super_block *);
> -void v9fs_stat2inode_dotl(struct p9_stat_dotl *, struct inode *);
> +void v9fs_stat2inode_flags(struct p9_wstat *stat, struct inode *inode,
> +			    struct super_block *sb, unsigned int flags);
> +void v9fs_stat2inode_dotl_flags(struct p9_stat_dotl *stat, struct inode *inode,
> +				 unsigned int flags);
>  int v9fs_dir_release(struct inode *inode, struct file *filp);
>  int v9fs_file_open(struct inode *inode, struct file *file);
>  void v9fs_inode2stat(struct inode *inode, struct p9_wstat *stat);
> @@ -83,4 +88,16 @@ static inline void v9fs_invalidate_inode_attr(struct inode *inode)
>  }
>  
>  int v9fs_open_to_dotl_flags(int flags);
> +
> +static inline void v9fs_stat2inode(struct p9_wstat *stat, struct inode *inode,
> +				    struct super_block *sb)
> +{
> +	v9fs_stat2inode_flags(stat, inode, sb, 0);
> +}
> +
> +static inline void v9fs_stat2inode_dotl(struct p9_stat_dotl *stat,
> +					 struct inode *inode)
> +{
> +	v9fs_stat2inode_dotl_flags(stat, inode, 0);
> +}
>  #endif
> diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
> index 85ff859d3af5..ca7d93184400 100644
> --- a/fs/9p/vfs_inode.c
> +++ b/fs/9p/vfs_inode.c
> @@ -1166,16 +1166,17 @@ static int v9fs_vfs_setattr(struct dentry *dentry, struct iattr *iattr)
>  }
>  
>  /**
> - * v9fs_stat2inode - populate an inode structure with mistat info
> + * v9fs_stat2inode_flags - populate an inode structure with mistat info
>   * @stat: Plan 9 metadata (mistat) structure
>   * @inode: inode to populate
>   * @sb: superblock of filesystem
> + * @flags: control flags (e.g. V9FS_STAT2INODE_KEEP_ISIZE)
>   *
>   */
>  
>  void
> -v9fs_stat2inode(struct p9_wstat *stat, struct inode *inode,
> -	struct super_block *sb)
> +v9fs_stat2inode_flags(struct p9_wstat *stat, struct inode *inode,
> +		       struct super_block *sb, unsigned int flags)
>  {
>  	umode_t mode;
>  	char ext[32];
> @@ -1184,6 +1185,9 @@ v9fs_stat2inode(struct p9_wstat *stat, struct inode *inode,
>  	struct v9fs_session_info *v9ses = sb->s_fs_info;
>  	struct v9fs_inode *v9inode = V9FS_I(inode);
>  
> +	/* Protect the updates of i_size & cache_validity */
> +	spin_lock(&inode->i_lock);
> +

You're still locking the whole function, which is what I complained
about in the previous patch.
First of all, in the KEEP_ISIZE path that lock isn't needed at all,
and it does not protect anything other than the i_size_write.


I do see that you wrote in the comment this is meant to protect
cache_validity, but it does not solve the race I mentionned in the v1
patch either - the race timing is from the moment the 9p getattr
request has been sent out, not just setting fields in here.
(e.g. thread1 sends getattr request, thread2 setattrs and invalidate
inode attr after request has been processed by server, thread1 receives
getattr and clears flag while attributes are still obsolete)

Let's fix this in a proper separate patch and not pretend it's resolved
by this trivial locking when it isn't.

For now, just locking around the i_size_write makes most sense.


Thanks,
-- 
Dominique



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux