Re: [PATCH 14/15] xfs: multithreaded iwalk implementation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 02, 2019 at 08:51:38AM -0700, Darrick J. Wong wrote:
> On Tue, Jul 02, 2019 at 10:33:20AM -0400, Brian Foster wrote:
> > On Wed, Jun 26, 2019 at 01:45:25PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > > 
> > > Create a parallel iwalk implementation and switch quotacheck to use it.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > > ---
> > >  fs/xfs/Makefile      |    1 
> > >  fs/xfs/xfs_globals.c |    3 +
> > >  fs/xfs/xfs_iwalk.c   |   82 +++++++++++++++++++++++++++++++++
> > >  fs/xfs/xfs_iwalk.h   |    2 +
> > >  fs/xfs/xfs_pwork.c   |  126 ++++++++++++++++++++++++++++++++++++++++++++++++++
> > >  fs/xfs/xfs_pwork.h   |   58 +++++++++++++++++++++++
> > >  fs/xfs/xfs_qm.c      |    2 -
> > >  fs/xfs/xfs_sysctl.h  |    6 ++
> > >  fs/xfs/xfs_sysfs.c   |   40 ++++++++++++++++
> > >  fs/xfs/xfs_trace.h   |   18 +++++++
> > >  10 files changed, 337 insertions(+), 1 deletion(-)
> > >  create mode 100644 fs/xfs/xfs_pwork.c
> > >  create mode 100644 fs/xfs/xfs_pwork.h
> > > 
> > > 
> > > diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> > > index 74d30ef0dbce..48940a27d4aa 100644
> > > --- a/fs/xfs/Makefile
> > > +++ b/fs/xfs/Makefile
> > > @@ -84,6 +84,7 @@ xfs-y				+= xfs_aops.o \
> > >  				   xfs_message.o \
> > >  				   xfs_mount.o \
> > >  				   xfs_mru_cache.o \
> > > +				   xfs_pwork.o \
> > >  				   xfs_reflink.o \
> > >  				   xfs_stats.o \
> > >  				   xfs_super.o \
> > > diff --git a/fs/xfs/xfs_globals.c b/fs/xfs/xfs_globals.c
> > > index d0d377384120..a44b564871b5 100644
> > > --- a/fs/xfs/xfs_globals.c
> > > +++ b/fs/xfs/xfs_globals.c
> > > @@ -31,6 +31,9 @@ xfs_param_t xfs_params = {
> > >  	.fstrm_timer	= {	1,		30*100,		3600*100},
> > >  	.eofb_timer	= {	1,		300,		3600*24},
> > >  	.cowb_timer	= {	1,		1800,		3600*24},
> > > +#ifdef DEBUG
> > > +	.pwork_threads	= {	-1,		-1,		NR_CPUS	},
> > 
> > So I noticed that /sys/fs/xfs/debug/pwork_threads was still 0 by default
> > with this patch, then realized that we've only initialized it in
> > xfs_params and not xfs_globals. The sysfs handlers added by this patch
> > use the latter whereas the former looks related to /proc. It looks like
> > this should be fixed up to init the global field to -1 and probably drop
> > the xfs_params bits since we don't expose the value via /proc.
> 
> Oh, you're right.  I'll drop the xfs_params bits.
> 
> > 
> > > +#endif
> > >  };
> > >  
> > >  struct xfs_globals xfs_globals = {
> > ...
> > > diff --git a/fs/xfs/xfs_pwork.c b/fs/xfs/xfs_pwork.c
> > > new file mode 100644
> > > index 000000000000..779596ed9432
> > > --- /dev/null
> > > +++ b/fs/xfs/xfs_pwork.c
> > > @@ -0,0 +1,126 @@
> > ...
> > > +/*
> > > + * Return the amount of parallelism that the data device can handle, or 0 for
> > > + * no limit.
> > > + */
> > > +unsigned int
> > > +xfs_pwork_guess_datadev_parallelism(
> > > +	struct xfs_mount	*mp)
> > > +{
> > > +	struct xfs_buftarg	*btp = mp->m_ddev_targp;
> > > +	int			iomin;
> > > +	int			ioopt;
> > > +
> > > +	if (blk_queue_nonrot(btp->bt_bdev->bd_queue))
> > > +		return num_online_cpus();
> > > +
> > > +	if (mp->m_sb.sb_width && mp->m_sb.sb_unit)
> > > +		return mp->m_sb.sb_width / mp->m_sb.sb_unit;
> > > +
> > > +	iomin = bdev_io_min(btp->bt_bdev);
> > > +	ioopt = bdev_io_opt(btp->bt_bdev);
> > > +	if (iomin && ioopt)
> > > +		return ioopt / iomin;
> > > +
> > > +	return 1;
> > > +}
> > 
> > I may have lost track of where we left off with this but IIRC we still
> > needed some numbers with regard to multi-device storage where
> > sb_width/sb_unit doesn't necessarily describe parallelism (i.e.
> > RAID5/6). I'm not sure what your goal is for this patch vs. the rest of
> > the series, but I'd be fine with merging this with just the
> > non-rotational bit for now if we add some kind of conservative maximum
> > (4? 8?) to the heuristic. It seems kind of risky to me to parallelize
> > 100s of AGs just because we might have that many CPUs/AGs on a
> > particular storage setup (we may also generate warning messages if a
> > system has a CPU count that exceeds the workqueue limit).
> 
> Hm, how about the most conservative threading that I can think of?
> 
> unsigned int
> xfs_pwork_guess_datadev_parallelism(
> 	struct xfs_mount	*mp)
> {
> 	return blk_queue_nonrot(...) ? 2 : 1;
> }
> 
> Since that did seem to yield speedups for everything except the dumb USB
> mass storage case.  Then I can work on building a case for more threads
> (and figuring out the maximums) for 5.4...
> 

Ack, that sounds good to me. Then we can get this code in sooner and
worry about opening up the parallelization incrementally based on
more focused testing/analysis.

Brian

> --D
> 
> > 
> > ...
> > > diff --git a/fs/xfs/xfs_sysfs.c b/fs/xfs/xfs_sysfs.c
> > > index cabda13f3c64..a146a2e61be2 100644
> > > --- a/fs/xfs/xfs_sysfs.c
> > > +++ b/fs/xfs/xfs_sysfs.c
> > > @@ -206,11 +206,51 @@ always_cow_show(
> > >  }
> > >  XFS_SYSFS_ATTR_RW(always_cow);
> > >  
> > > +#ifdef DEBUG
> > > +/*
> > > + * Override how many threads the parallel work queue is allowed to create.
> > > + * This has to be a debug-only global (instead of an errortag) because one of
> > > + * the main users of parallel workqueues is mount time quotacheck.
> > > + */
> > > +STATIC ssize_t
> > > +pwork_threads_store(
> > > +	struct kobject	*kobject,
> > > +	const char	*buf,
> > > +	size_t		count)
> > > +{
> > > +	int		ret;
> > > +	int		val;
> > > +
> > > +	ret = kstrtoint(buf, 0, &val);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	if (val < 0 || val > num_possible_cpus())
> > > +		return -EINVAL;
> > > +
> > 
> > We need to allow assignment of -1 now too.
> 
> <nod>
> 
> --D
> 
> > Brian
> > 
> > > +	xfs_globals.pwork_threads = val;
> > > +
> > > +	return count;
> > > +}
> > > +
> > > +STATIC ssize_t
> > > +pwork_threads_show(
> > > +	struct kobject	*kobject,
> > > +	char		*buf)
> > > +{
> > > +	return snprintf(buf, PAGE_SIZE, "%d\n", xfs_globals.pwork_threads);
> > > +}
> > > +XFS_SYSFS_ATTR_RW(pwork_threads);
> > > +#endif /* DEBUG */
> > > +
> > >  static struct attribute *xfs_dbg_attrs[] = {
> > >  	ATTR_LIST(bug_on_assert),
> > >  	ATTR_LIST(log_recovery_delay),
> > >  	ATTR_LIST(mount_delay),
> > >  	ATTR_LIST(always_cow),
> > > +#ifdef DEBUG
> > > +	ATTR_LIST(pwork_threads),
> > > +#endif
> > >  	NULL,
> > >  };
> > >  
> > > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> > > index f9bb1d50bc0e..658cbade1998 100644
> > > --- a/fs/xfs/xfs_trace.h
> > > +++ b/fs/xfs/xfs_trace.h
> > > @@ -3556,6 +3556,24 @@ TRACE_EVENT(xfs_iwalk_ag_rec,
> > >  		  __entry->startino, __entry->freemask)
> > >  )
> > >  
> > > +TRACE_EVENT(xfs_pwork_init,
> > > +	TP_PROTO(struct xfs_mount *mp, unsigned int nr_threads, pid_t pid),
> > > +	TP_ARGS(mp, nr_threads, pid),
> > > +	TP_STRUCT__entry(
> > > +		__field(dev_t, dev)
> > > +		__field(unsigned int, nr_threads)
> > > +		__field(pid_t, pid)
> > > +	),
> > > +	TP_fast_assign(
> > > +		__entry->dev = mp->m_super->s_dev;
> > > +		__entry->nr_threads = nr_threads;
> > > +		__entry->pid = pid;
> > > +	),
> > > +	TP_printk("dev %d:%d nr_threads %u pid %u",
> > > +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> > > +		  __entry->nr_threads, __entry->pid)
> > > +)
> > > +
> > >  #endif /* _TRACE_XFS_H */
> > >  
> > >  #undef TRACE_INCLUDE_PATH
> > > 



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux